如果字符串是未修饰的(即,没有标记),则以下任一项都能很好地工作:
data = Main Idea, key term, key term, key term
# example #1
/^(.+?, )(.+)/.match(data).captures.each_slice(2).map { |a,b| a << %Q{<span class="smaller_font">#{ b }</span>}}.first
# => "Main Idea, <span class="smaller_font">key term, key term, key term</span>"
# example #2
data =~ /^(.+?, )(.+)/
$1 << %Q{<span class="smaller_font">#{ $2 }</span>}
# => "Main Idea, <span class="smaller_font">key term, key term, key term</span>"
如果字符串有标记,则不鼓励使用正则表达式处理HTML或XML,因为它很容易中断。对您控制的HTML进行极其琐碎的使用是非常安全的,但如果内容或格式发生变化,正则表达式可能会崩溃,破坏您的代码。
HTML解析器是通常推荐的解决方案,因为如果内容或其格式发生更改,它们将继续工作。这就是我要用野村做的。我故意冗长地解释发生了什么:
require nokogiri
# build a sample document
html = <a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>
doc = Nokogiri::HTML(html)
puts doc.to_s,
# find the link
a_tag = doc.at_css( a[href=stupidreqexquestion] )
# break down the tag content
a_text = a_tag.content
main_idea, key_terms = a_text.split(/,s+/, 2) # => ["Main Idea", "key term, key term, key term"]
a_tag.content = main_idea
# create a new node
span = Nokogiri::XML::Node.new( span , doc)
span[ class ] = smaller_font
span.content = key_terms
puts span.to_s,
# add it to the old node
a_tag.add_child(span)
puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><a href="stupidreqexquestion">Main Idea, key term, key term, key term</a></body></html>
# >>
# >> <span class="smaller_font">key term, key term, key term</span>
# >>
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>
在上面的输出中,您可以看到Nokogiri是如何构建示例文档、添加的跨度以及生成的文档的。
它可以简化为:
require nokogiri
doc = Nokogiri::HTML( <a href="stupidreqexquestion">Main Idea, key term, key term, key term</a> )
a_tag = doc.at_css( a[href=stupidreqexquestion] )
main_idea, key_terms = a_tag.content.split(/,s+/, 2)
a_tag.content = main_idea
a_tag.add_child("<span class= smaller_font >#{ key_terms }</span>")
puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>