English 中文(简体)
如何从案文中提取URLs
原标题:How to extract URLs from text
  • 时间:2010-09-08 06:17:00
  •  标签:
  • ruby

我如何从鲁比的简易文本档案中提取所有URLs?

我对一些图书馆进行了审判,但在有些情况下却失败。 什么最佳方式?

最佳回答

哪些案例失败了?

图书馆regexpert

regexp = /(^$)|(^(http|https)://[a-z0-9]+([-.]{1}[a-z0-9]+)*.[a-z]{2,5}(([0-9]{1,5})?/.*)?$)/ix

然后在案文上填写scan

EDIT: Seems such as the regexpsupport the separate string. 请更正如下:

问题回答

如果你想利用在鲁比拉已经为你规定的内容:

require "uri"
URI.extract("text here http://foo.example.org/bla and here mailto:test@example.com and here also.")
# => ["http://foo.example.org/bla", "mailto:test@example.com"]

http://railsapi.com/doc/ruby-v1.8/classes/URI.html#M004495"

页: 1 gem

require "twitter-text"
class UrlParser
    include Twitter::Extractor
end

urls = UrlParser.new.extract_urls("http://stackoverflow.com")
puts urls.inspect

http://ruby-doc.org/french/sc/presidencyes/String.html#M000812”rel=“noreferer”>.scan(

string.scan(/(https?://([-w.]+)+(:d+)?(/([w/_.]*(?S+)?)?)?)/)

你们可以先从这个地方开始,并根据你们的需要进行调整。

如果你的意见类似:

"http://i.imgur.com/c31IkbM.gifv;http://i.imgur.com/c31IkbM.gifvhttp://i.imgur.com/c31IkbM.gifv"

i.e. URLs不一定有周围的白色空间,可以由任何划界人划定,或者完全没有划界,你可以采用以下方法:

def process_images(raw_input)
  return [] if raw_input.nil?
  urls = raw_input.split( http )
  urls.shift
  urls.map { |url| "http#{url}".strip.split(/[s,;]/)[0] }
end

希望!

require  uri     
foo = #<URI::HTTP:0x007f91c76ebad0 URL:http://foobar/00u0u_gKHnmtWe0Jk_600x450.jpg>
foo.to_s
=> "http://foobar/00u0u_gKHnmtWe0Jk_600x450.jpg"

<<>strong>edit: 解释

对于那些通过JSON的对策或通过使用Nokogiri或Mechanize等报废工具而使URI陷入困境的人,这一解决办法对我有利。





相关问题
Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

rails collection_select vs. select

collection_select and select Rails helpers: Which one should I use? I can t see a difference in both ways. Both helpers take a collection and generates options tags inside a select tag. Is there a ...

RubyCAS-Client question: Rails

I ve installed RubyCAS-Client version 2.1.0 as a plugin within a rails app. It s working, but I d like to remove the ?ticket= in the url. Is this possible?

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

multiple ruby extension modules under one directory

Can sources for discrete ruby extension modules live in the same directory, controlled by the same extconf.rb script? Background: I ve a project with two extension modules, foo.so and bar.so which ...

Text Editor for Ruby-on-Rails

guys which text editor is good for Rubyonrails? i m using Windows and i was using E-Texteditor but its not free n its expired now can anyone plese tell me any free texteditor? n which one is best an ...

热门标签