English 中文(简体)
Best screen scraper, simple html dom or snoopy?
原标题:

which one is better for screen scraping? simple html dom or snoopy ?? i use simple html dom and find it comfortable.. does snoopy has any advantage over simple html dom?

my requirements : if i wanna scrape contents from a page(after login).. simple html dom is easy but it takes a lotta time to print the results..

最佳回答

Is Snoopy that well known / mature of a package?

If it s not, then all other things being equal, I d probably go with generic HTML DOM code - especially if the scraping is somewhat simple.

But only you know when your code is starting to get too big, unmanageable, etc., at which point it might be better to look at another tool out there like Snoopy.

(Which, admittedly, I don t have experience with; it s apparently at http://sourceforge.net/projects/snoopy/ for those not familiar with it - "Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example.")

The real reason I m posting, even though I don t know Snoopy per se and thus can t definitively answer your question, is to ask if you ve considered using Selenium (http://www.seleniumhq.org/) instead of Snoopy.

Selenium is a fairly well-known testing tool, and it occurred to me that one of the nice things about using that for what you re doing (if you can) is that it has built in tests.

The reason that s good is that screen scraping is kind of an inherently brittle task - if the target site changes something, blam, your scraping fails. So it s kind of a nice design to have an automated scrape/test-that-scraping-worked system.

Something to think about, anyway.

问题回答

I ve stumbled into BeautifulSoup, which is Python-based. I suppose there are a bunch of others too.

Looks like Snoopy is PHP-based, and hence can be run server-side only. Is this what you are really looking for? What are your requirements? Please elaborate on that.





相关问题
Best screen scraper, simple html dom or snoopy?

which one is better for screen scraping? simple html dom or snoopy ?? i use simple html dom and find it comfortable.. does snoopy has any advantage over simple html dom? my requirements : if i wanna ...

Is there anyway to scrape flash in this format?

is it possible to scrape this applet http://www.text118118.com/livefeed.aspx Its not possible to do it traditionally as the text is within the applet however is it possible to do it with a macro. ...

Scraping from wsj.com or finance.yahoo.com

I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it s been open. What is the best way to go about doing this?

Screen Scraping in PHP with login

Looking around for a solution to this, I have found different methods. Some use regex, some use DOM scripting or something. I want to go to a site, log in, fill out a form and then check if the form ...

How to retrieve a directory of files from a remote server?

If I have a directory on a remote web server that allows directory browsing, how would I go about to fetch all those files listed there from my other web server? I know I can use urllib2.urlopen to ...

热门标签