English 中文(简体)
I m 试图从一个网站检索所有https和https链接,但有时会失去例外。
原标题:I m trying to retrieve all http and https links from a websites but sometimes im getting null exception
public partial class Form1 : Form
{
   int y = 0;
   string url = @"http://www.google.co.il";
   string urls = @"http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";

   public Form1()
   {
       InitializeComponent();
       //webCrawler(urls, 3);
       List<string> a = webCrawler(urls, 1);
       //GetAllImages();
   }

   private int factorial(int n)
   {
      if (n == 0) return 1;
      else y = n * factorial(n - 1);
      listBox1.Items.Add(y);
      return y;
   }

   private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
   {
       List<string> mainLinks = new List<string>();

       if (document.DocumentNode.SelectNodes("//a[@href]") == null)
       { }

       foreach (HtmlNode link in document.DocumentNode.SelectNodes("//a[@href]"))
       {
           var href = link.Attributes["href"].Value;
           mainLinks.Add(href);
       }

       return mainLinks;
   }

   private List<string> webCrawler(string url, int levels)
   {
      HtmlAgilityPack.HtmlDocument doc;
      HtmlWeb hw = new HtmlWeb(); 

      List<string> webSites;// = new List<string>();
      List<string> csFiles = new List<string>();

      csFiles.Add("temp string to know that something is happening in level = " + levels.ToString());
      csFiles.Add("current site name in this level is : "+url);
      /* later should be replaced with real cs files .. cs files links..*/

      doc = hw.Load(url);
      webSites = getLinks(doc);

      if (levels == 0)
      {
         return csFiles;
      }
      else
      {
         int actual_sites = 0;

         for (int i = 0; i < webSites.Count() && i< 100000; i++) // limiting ourseleves for 20 sites for each level for now..
         //or it will take forever.
         {
             string t = webSites[i];
             /*
                    if (!webSites.Contains(t))
                    {
                        webCrawler(t, levels - 1);
                    }
             */

             if ( (t.StartsWith("http://")==true) || (t.StartsWith("https://")==true) ) // replace this with future FilterJunkLinks function
             {
                actual_sites++;
                csFiles.AddRange(webCrawler(t, levels - 1));
                richTextBox1.Text += t + Environment.NewLine;
             }
          }

          // report to a message box only at high levels..
          if (levels==1)
             MessageBox.Show(actual_sites.ToString());

          return csFiles;
       }                
    }

少数网站被送至getLinks功能之后,便将这一例外情况推向了。

The exception is in the getLinks function on the line:

foreach (HtmlNode link in document.DocumentNode.SelectNodes("//a[@href]"))

不针对物体的事例提出反对

我试图利用国际红十字与红新月联会来检查其失效后,我是否填写了return mainLinks;

但是,如果我这样做的话,我就不从网站上获取所有链接。

如今,在构造中使用尿素的Im(www.google.co.il) 在几秒之后,我就获得同样的例外。

我可以说明为什么这一例外正在消失。 是否存在这一例外的理由?

System.NullReferenceException was unhandled
Message=不针对物体的事例提出反对.
Source=GatherLinks
StackTrace:
at GatherLinks.Form1.getLinks(HtmlDocument document) in D:C-SharpGatherLinksGatherLinksGatherLinksForm1.cs:line 55
at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:C-SharpGatherLinksGatherLinksGatherLinksForm1.cs:line 76
at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:C-SharpGatherLinksGatherLinksGatherLinksForm1.cs:line 104
at GatherLinks.Form1..ctor() in D:C-SharpGatherLinksGatherLinksGatherLinksForm1.cs:line 29
at GatherLinks.Program.Main() in D:C-SharpGatherLinksGatherLinksGatherLinksProgram.cs:line 18
at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

最佳回答

问题似乎在于你重新测试无效,但随后却不做任何事情。

            if (document.DocumentNode.SelectNodes("//a[@href]") == null)
            {
            }

I suspect you want to handle the null case but haven t written the code to do it. You probably want something like:

    private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {
           List<string> mainLinks = new List<string>();
           if (document.DocumentNode.SelectNodes("//a[@href]") != null)
            {

                foreach (HtmlNode link in document.DocumentNode.SelectNodes("//a[@href]"))
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;
        }

you d probably want to tidy up to something more like:

   private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {
           List<string> mainLinks = new List<string>();
           var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
           if (linkNodes != null)
            {
                foreach (HtmlNode link in linkNodes)
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;
        }
问题回答

暂无回答




相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签