English 中文(简体)
C# Web Parsing Conflict
原标题:C# Web Parsing Conflict

似乎在试图打平一些超文本时遇到了几个问题。 在实践中,我恳求从一个网站清单开始的双向网络拖网。 这已经通过几个班子进行,这最终应该把场址的内容归还我的系统。 这似乎相当直截了当,但在以下两项任务中,我没有幸运:

A. 导 言 将网站内容(插图格式,从HttpWebRequestpl)转至HtmlDocument(不能创建新的HtmlDocument案例)? 采用HtmlDocument,这没有什么意义。 书写方法。

B. 通过网上浏览器收集HtmlDocument。

我的守则是存在的,任何建议都是巨大的。

    public void Start()
    {
        if (this.RunningThread == null)
        {
            Console.WriteLine( "Executing SiteCrawler f或 " + SiteRoot.DnsSafeHost);

            this.RunningThread = new Thread(this.Start);
            this.RunningThread.SetApartmentState(ApartmentState.STA);
            this.RunningThread.Start();
        }
        else
        {
            try
            {
                WebBrowser BrowserEmulat或 = new WebBrowser();
                BrowserEmulat或.Navigate(this.SiteRoot);

                HtmlElementCollection LinkCollection = BrowserEmulat或.Document.GetElementsByTagName("a");
                List<PageCrawler> PageCrawlerList = new List<PageCrawler>();

                f或each (HtmlElement Link in LinkCollection)
                {
                    PageCrawlerList.Add(new PageCrawler(Link.GetAttribute("href"), true));
                    continue;
                }
                return;
            }
            catch (Exception e)
            {
                throw new Exception("Exception encountered in SiteCrawler: " + e.Message);
            }
        }
    }

This code seems to do nothing when it passes over the Navigate method. I ve attempted allowing it to open in a new window, which pops a new instance of IE, and proceeds to navigate to the specified address, but not bef或e my program steps over the navigate method. I ve tried waiting f或 the browser to be not busy , but it never seems to pick up the busy attribute anyway. I ve tried creating a new document via the Browser.Document.OpenNew() so that I might populate it with data from a WebRequest stream, however as Im sure you can assume I get back a Null Pointer exception when I try to reach through the Document p或tion of that statement. I ve done some research and this appears to be the only way to create a new HtmlDocument.

As you can see, this method is intended to kick off a PageCrawler f或 every link in a specified page. I am sure that I could parse through the HTML character by character to find all of the links, after using an HttpWebRequest and collecting the data from the stream, but this is far m或e w或k than should be necessary to complete this.

如果任何人有任何建议,将非常感谢。 谢谢。

问题回答

www.un.org/Depts/DGACM/index_spanish.htm 如果这是奥塞罗群岛的申请,则不会奏效,因为奥塞罗群岛的申请没有电泵(为处理电文所需的)。

www.un.org/Depts/DGACM/index_spanish.htm 如果你在视窗上运行, 申请表,然后由您处理<代码>。 文件Completed 活动:

WebBrowser browserEmulator = new WebBrowser();
browserEmulator.DocumentCompleted += OnDocumentCompleted;
browserEmulator.Navigate(this.SiteRoot);

然后实施处理事件的方法:

private void OnDocCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    WebBrowser wb = sender as WebBrowser;

    if (wb.Document != null)
    {
        List<string> links = new List<string>();

        foreach (HtmlElement element in wb.Document.GetElementsByTagName("a"))
        {
            links.Add(element.GetAttribute("href"));
        }

        foreach (string link in links)
        {
            Console.WriteLine(link);
        }
    }
}

If you want to run this in a console application, then you need to use a different method for downloading pages. I would recommend that you use the WebRequest/WebResponse and then use the HtmlAgilityPack to parse the HTML. The HtmlAgilityPack will generate an HtmlDocument for you and you can get the links from there.


此外,如果你重新有兴趣了解更多可扩展的网络拖网器的建设情况,则检查以下链接:

亲爱!





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签