Question

我有弦像:

sanitize_text = `"<b><i>this is the bold text</i></b><i>this is the italic</i>"`

我的问题是:

要在字符串中分析字符, 并找到特定的 html 标签 < code> ( & lt; b>, & lt; i>...) , 然后在文本之间应用属性。
需要对每一文本应用属性。

我正向着这个方向走来:

sanitize_arr = sanitize_text.scan(/Dd*/)

santize_arr.each_with_index do |char, index|
  if character is new start tag ==  <b> 
    Then for next characters till  </b>  I will apply some bold properties .    
  end
  if character is new start tag ==  <i> 
    Then for next characters till  </i>  I will apply some italic properties .
  end
end

只是好奇我是否正朝着正确的方向走来, 或者有什么更好的解决办法吗?

Answer 1

是的,我已经做到了,像:

santize_text = "<b><u>this</u></b><i><p>this is the italic text</p></i>"

santize_arr = santize_text.scan(/Dd*/)
char_array , html_tag_array = [], []
continue_insert_char_array, continue_insert_arr2 = false,false
santize_arr.each_with_index do |char, index|
  #To check new start tag
  continue_insert_char_array = true if char== <  && santize_arr[index+1]!= / 
  if continue_insert_char_array
    char_array << char
    if char== >  && continue_insert_char_array
      continue_insert_char_array = false
      html_tag_array << char_array.join
      char_array = []
    end
    next
  end

  #To check new end tag
  continue_insert_arr2 = true if char== <  && santize_arr[index+1]== / 
  if continue_insert_arr2
    char_array << char
    if char== >  && continue_insert_arr2
      continue_insert_arr2 = false
      html_tag_array.delete(char_array.join.gsub( / , ""))
      char_array = []
    end
    next
  end

  # Apply the property on the character
  "Bold Char" if html_tag_array.include?("<b>")
  "Italic Char" if html_tag_array.include?("<i>")
end

若有任何改变, 请告诉我,

Answer 2

Please correct me if i m wrong. You want to find specific html tags in text and do some manipulations with them? Did you try Nokogiri gem?

并做类似的事情:

require  nokogiri 
nokogiri_object=Nokogiri::HTML(sanitize_text)
bold_text=nokogiri_object.css( b ).text
puts bold_text

outputs "this is the bold text"

Answer 3

You could write your own XML Parser.. no seriously! Check out Parslet Infact the examples it comes with include an XML Parser

像这样的事情:

require  parslet 

class XML < Parslet::Parser
  root :document

  rule(:document)   { (formatting | text).repeat(1) }  
  rule(:formatting) { tag_pair( b ).as(:bold) | tag_pair( u ).as(:underline) | tag_pair( i ).as(:italic) } 

  def tag(type)
     str( < ) >> str(type) >> str( > )
  end

  def tag_pair(type)
    tag(type) >> document.maybe >> tag("/" + type)
  end

  rule(:text) {
    match( [^<>] ).repeat(1).as(:text)
  }
end

 parser = XML.new
 input = ARGV[0]

 require  parslet/convenience 
 puts parser.parse_with_debug(input).inspect

产生像这样的东西...

> ruby xmlparser.rb "<b>bold<i>italic</i> bold again <u>underlined</u></b>"

[{:bold}gt;{{{{{{{{{{{{:text}}gt;"bold}}{{{{{{{{{{{{{{{{{{{{{{{}}}}{{{{{{{{{{{}}}{{{{{{{{{{}}}}{{{{{{{{{{{{}}}}}{{{{{{{{{{{{{}}}}}{{{{{{{{{{{}}}}}}{{{{{{{{{{{}}}}}}}{{{{{{{{{{{{{{}}}}}}}}}{{{{{{{{{{{{{{{{{{{}}}}}}}{{{{{{{}}}}}}}}}}}}}{{{{{{{{{}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

正如你可以看到的,这棵树有风格的结点, 用于大胆的斜体等, 以及其中的内容。

它可以很容易地扩展到处理白色空间, 并处理您所关心的其他标签。处理您所不关心的标签要困难得多。

总之,只是展示各种可能性。

使用 Parslet, 您通常会先写入一个变换类, 将此树结构转换为您想要最终完成的工作。我喜欢 Parslet 使用解析数据进行解析的方式。

希望这能帮上忙

友情链接