English 中文(简体)
在字符串中分析字符字符,并找到特定的 html 标签
原标题:parse the character in string and find the specific html tag
  • 时间:2012-05-23 06:09:05
  •  标签:
  • ruby
  • parsing

我有弦像:

sanitize_text = `"<b><i>this is the bold text</i></b><i>this is the italic</i>"`

我的问题是:

  1. 要在字符串中分析字符, 并找到特定的 html 标签 < code> ( & lt; b>, & lt; i>...) , 然后在文本之间应用属性 。

  2. 需要对每一文本应用属性。

我正向着这个方向走来:

sanitize_arr = sanitize_text.scan(/Dd*/)

santize_arr.each_with_index do |char, index|
  if character is new start tag ==  <b> 
    Then for next characters till  </b>  I will apply some bold properties .    
  end
  if character is new start tag ==  <i> 
    Then for next characters till  </i>  I will apply some italic properties .
  end
end

只是好奇我是否正朝着正确的方向走来, 或者有什么更好的解决办法吗?

最佳回答

是的,我已经做到了,像:

santize_text = "<b><u>this</u></b><i><p>this is the italic text</p></i>"

santize_arr = santize_text.scan(/Dd*/)
char_array , html_tag_array = [], []
continue_insert_char_array, continue_insert_arr2 = false,false
santize_arr.each_with_index do |char, index|
  #To check new start tag
  continue_insert_char_array = true if char== <  && santize_arr[index+1]!= / 
  if continue_insert_char_array
    char_array << char
    if char== >  && continue_insert_char_array
      continue_insert_char_array = false
      html_tag_array << char_array.join
      char_array = []
    end
    next
  end

  #To check new end tag
  continue_insert_arr2 = true if char== <  && santize_arr[index+1]== / 
  if continue_insert_arr2
    char_array << char
    if char== >  && continue_insert_arr2
      continue_insert_arr2 = false
      html_tag_array.delete(char_array.join.gsub( / , ""))
      char_array = []
    end
    next
  end

  # Apply the property on the character
  "Bold Char" if html_tag_array.include?("<b>")
  "Italic Char" if html_tag_array.include?("<i>")
end

若有任何改变, 请告诉我,

问题回答

Please correct me if i m wrong. You want to find specific html tags in text and do some manipulations with them? Did you try Nokogiri gem?

并做类似的事情:

require  nokogiri 
nokogiri_object=Nokogiri::HTML(sanitize_text)
bold_text=nokogiri_object.css( b ).text
puts bold_text

outputs "this is the bold text"

You could write your own XML Parser.. no seriously! Check out Parslet Infact the examples it comes with include an XML Parser

像这样的事情:

require  parslet 

class XML < Parslet::Parser
  root :document

  rule(:document)   { (formatting | text).repeat(1) }  
  rule(:formatting) { tag_pair( b ).as(:bold) | tag_pair( u ).as(:underline) | tag_pair( i ).as(:italic) } 

  def tag(type)
     str( < ) >> str(type) >> str( > )
  end

  def tag_pair(type)
    tag(type) >> document.maybe >> tag("/" + type)
  end

  rule(:text) {
    match( [^<>] ).repeat(1).as(:text)
  }
end

 parser = XML.new
 input = ARGV[0]

 require  parslet/convenience 
 puts parser.parse_with_debug(input).inspect

产生像这样的东西...

> ruby xmlparser.rb "<b>bold<i>italic</i> bold again <u>underlined</u></b>"

[{:bold}gt;{{{{{{{{{{{{:text}}gt;"bold}}{{{{{{{{{{{{{{{{{{{{{{{}}}}{{{{{{{{{{{}}}{{{{{{{{{{}}}}{{{{{{{{{{{{}}}}}{{{{{{{{{{{{{}}}}}{{{{{{{{{{{}}}}}}{{{{{{{{{{{}}}}}}}{{{{{{{{{{{{{{}}}}}}}}}{{{{{{{{{{{{{{{{{{{}}}}}}}{{{{{{{}}}}}}}}}}}}}{{{{{{{{{}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

正如你可以看到的,这棵树有风格的结点, 用于大胆的斜体等, 以及其中的内容。

它可以很容易地扩展到处理白色空间, 并处理您所关心的其他标签。 处理您所不关心的标签要困难得多。

总之,只是展示各种可能性。

使用 Parslet, 您通常会先写入一个变换类, 将此树结构转换为您想要最终完成的工作 。 我喜欢 Parslet 使用解析数据进行解析的方式 。

希望这能帮上忙





相关问题
Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

rails collection_select vs. select

collection_select and select Rails helpers: Which one should I use? I can t see a difference in both ways. Both helpers take a collection and generates options tags inside a select tag. Is there a ...

RubyCAS-Client question: Rails

I ve installed RubyCAS-Client version 2.1.0 as a plugin within a rails app. It s working, but I d like to remove the ?ticket= in the url. Is this possible?

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

multiple ruby extension modules under one directory

Can sources for discrete ruby extension modules live in the same directory, controlled by the same extconf.rb script? Background: I ve a project with two extension modules, foo.so and bar.so which ...

Text Editor for Ruby-on-Rails

guys which text editor is good for Rubyonrails? i m using Windows and i was using E-Texteditor but its not free n its expired now can anyone plese tell me any free texteditor? n which one is best an ...

热门标签