English 中文(简体)
我如何拆除一个有多页的场址,并与Ruby建立一页。
原标题:How do I scrape a site, with multiple pages, and create one single html page with Ruby?
  • 时间:2011-11-05 17:41:16
  •  标签:
  • ruby
  • hpricot

So what I would like to do is scrape this site: http://boxerbiography.blogspot.com/ and create one HTML page that I can either print or send to my Kindle.

我正在考虑使用Hpricot,但不清楚如何继续工作。

我如何确定,以便重新检查每一环节,获得超文本,要么将其储存在变数中,要么将其丢到主要的超文本页上,然后回到目录上,并保持这样做?

你们不必告诉我如何这样做,而只是我可能想这样做的理论。

我确实必须看一看其中一条的渊源(正好是前文),例如: 查阅来源:http:// Boxerbiography.blogspot.com/2006/12/10-progamer-lim-yohwan-e-sports-icon.html,并人工编排某些标签之间的文字(如h3,p等)?

如果我采取这种做法,我就不得不研究每一章/条款的每个来源,然后这样做。 难道不能够打败书写文字的目的吗?

理想的情况是,我希望能说明一下共同提交文件与其他法典之间的区别,而只是案文的摆放(与适当的标题和这样)。

确实希望得到一些指导。

感谢。

最佳回答

I d recomment using Nokogiri instead of Hpricot. It s more robust, uses less resources, fewer bugs, it s easier to use, and faster.

I did some scraping extensively for work on time, and had to switch to Nokogiri, because Hpricot would crash on some pages unexplicably.

检查这种铁路 种姓:

http://railscasts.com/episodes/190-cr-scraping-with-nokogiri” rel=“nofollow”http://railscasts.com/episodes/190-cr-scraping-with-nokogiri

并且

http://nokogiri.org/“rel=“nofollow”http://nokogiri.org/

http://www.engineyard.com/blog/ 2010_started-with-nokogiri/

问题回答

暂无回答




相关问题
Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

rails collection_select vs. select

collection_select and select Rails helpers: Which one should I use? I can t see a difference in both ways. Both helpers take a collection and generates options tags inside a select tag. Is there a ...

RubyCAS-Client question: Rails

I ve installed RubyCAS-Client version 2.1.0 as a plugin within a rails app. It s working, but I d like to remove the ?ticket= in the url. Is this possible?

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

multiple ruby extension modules under one directory

Can sources for discrete ruby extension modules live in the same directory, controlled by the same extconf.rb script? Background: I ve a project with two extension modules, foo.so and bar.so which ...

Text Editor for Ruby-on-Rails

guys which text editor is good for Rubyonrails? i m using Windows and i was using E-Texteditor but its not free n its expired now can anyone plese tell me any free texteditor? n which one is best an ...

热门标签