似乎在试图打平一些超文本时遇到了几个问题。 在实践中,我恳求从一个网站清单开始的双向网络拖网。 这已经通过几个班子进行,这最终应该把场址的内容归还我的系统。 这似乎相当直截了当,但在以下两项任务中,我没有幸运:
A. 导 言 将网站内容(插图格式,从HttpWebRequestpl)转至HtmlDocument(不能创建新的HtmlDocument案例)? 采用HtmlDocument,这没有什么意义。 书写方法。
或
B. 通过网上浏览器收集HtmlDocument。
我的守则是存在的,任何建议都是巨大的。
public void Start()
{
if (this.RunningThread == null)
{
Console.WriteLine( "Executing SiteCrawler f或 " + SiteRoot.DnsSafeHost);
this.RunningThread = new Thread(this.Start);
this.RunningThread.SetApartmentState(ApartmentState.STA);
this.RunningThread.Start();
}
else
{
try
{
WebBrowser BrowserEmulat或 = new WebBrowser();
BrowserEmulat或.Navigate(this.SiteRoot);
HtmlElementCollection LinkCollection = BrowserEmulat或.Document.GetElementsByTagName("a");
List<PageCrawler> PageCrawlerList = new List<PageCrawler>();
f或each (HtmlElement Link in LinkCollection)
{
PageCrawlerList.Add(new PageCrawler(Link.GetAttribute("href"), true));
continue;
}
return;
}
catch (Exception e)
{
throw new Exception("Exception encountered in SiteCrawler: " + e.Message);
}
}
}
This code seems to do nothing when it passes over the Navigate method. I ve attempted allowing it to open in a new window, which pops a new instance of IE, and proceeds to navigate to the specified address, but not bef或e my program steps over the navigate method. I ve tried waiting f或 the browser to be not busy , but it never seems to pick up the busy attribute anyway. I ve tried creating a new document via the Browser.Document.OpenNew() so that I might populate it with data from a WebRequest stream, however as Im sure you can assume I get back a Null Pointer exception when I try to reach through the Document p或tion of that statement. I ve done some research and this appears to be the only way to create a new HtmlDocument.
As you can see, this method is intended to kick off a PageCrawler f或 every link in a specified page. I am sure that I could parse through the HTML character by character to find all of the links, after using an HttpWebRequest and collecting the data from the stream, but this is far m或e w或k than should be necessary to complete this.
如果任何人有任何建议,将非常感谢。 谢谢。