English 中文(简体)
Automated Web Scraping Issues
原标题:

I am developing a rather large automation application to scrape various abandoned property information from various state databases, in order to find specific properties. I have already developed search scripts for about 8 state websites, using various forms of automation. I prefer to use something like ruby s Mechanize library to perform the automation, because it is the most stable method I have come across so far. In some cases, I am unable to automate the scraping with Mechanize and must fall back to something like Watir (or, more specifically, the branch of Watir called Vapir). Vapir is needed specifically when a source requires javascript to be searched, since Mechanize only makes HTTP requests and does not deal with JS interpretation.

My problem is with Vapir automating an instance of Internet Explorer. In some cases, after prolonged searches (some of these searches are for lists of 4,000+ search terms), IE locks up. I assume it is an issue with the OLE engine. The error I receive is as follows:

failed to create WIN32OLE object from `InternetExplorer.Application  HRESULT error code:0x80004005 Unspecified error

I cannot find anything to resolve this issue.

My question is if anyone knows of any solution or work-around to an automated OLE instance that locks up? To fix the error, I have to manually kill all of the IE processes and restart the automated search.

Alternatives that I am aware of are to automate Firefox through Vapir in the back-end (rather than IE), or possibly switch over to something like PhantomJS. Does anybody have an opinion on either of these options?

问题回答

Is there a reason you are using Vapir? Why don t you try watir (drives Internet Explorer) or watir-webdriver (drives Internet Explorer, Firefox, Chrome and Opera) gems?

For installation see https://github.com/zeljkofilipin/watirbook/blob/master/installation/windows.md





相关问题
Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

rails collection_select vs. select

collection_select and select Rails helpers: Which one should I use? I can t see a difference in both ways. Both helpers take a collection and generates options tags inside a select tag. Is there a ...

RubyCAS-Client question: Rails

I ve installed RubyCAS-Client version 2.1.0 as a plugin within a rails app. It s working, but I d like to remove the ?ticket= in the url. Is this possible?

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

multiple ruby extension modules under one directory

Can sources for discrete ruby extension modules live in the same directory, controlled by the same extconf.rb script? Background: I ve a project with two extension modules, foo.so and bar.so which ...

Text Editor for Ruby-on-Rails

guys which text editor is good for Rubyonrails? i m using Windows and i was using E-Texteditor but its not free n its expired now can anyone plese tell me any free texteditor? n which one is best an ...

热门标签