English 中文(简体)
RegEx: Link Twitter-Name Mentions to Twitter in HTML
原标题:

I want to do THIS, just a little bit more complicated:

Lets say, I have an HTML input:

<a href="http://www.example.com" title="Bla @test blubb">Don t break!</a>
Some Twitter Users: @codinghorror, @spolsky, @jarrod_dixon and @blam4c.
You can t reach me at blam4c@example.com.

Is there a good RegEx to replace the twitter username mentions by links to twitter, but leave @example (eMail-Adress at the bottom) AND @test (in the link title, i.e. in HTML tags)?

It probably should also try to not add links inside existing links, i.e. not break this:

<a href="http://www.example.com">Hello @someone there!</a>

My current attempt is to add ">" at the beginning of the string, then use this RegEx:

Search:   />([^<]*s)@([a-z0-9_]+)([s,.!?])/i 
Replace:  >1<a href="http://twitter.com/2">@2</a>3 

Then remove the ">" I added in step 1.

But that won t match anything but the "@blam4c". I know WHY it does so, that s not the problem.

I would like to find a solution that finds and replaces all twitter user name mentions without destroying the HTML. Maybe it might even be better to code this without RegEx?

最佳回答

First, keep the angle brackets out of your regexps.

Use a HTML parser and xpath to select the text nodes you are interested in processing, then consider a regexp for matching only @refs in those nodes.

I ll let to other people to try and give a specific answer to the regex part.

问题回答

I agree with ddaa, there s almost no sane way to attack this without stripping the html links out first.

Presumably you d be starting out with an actual Twitter message, which cannot by definition include any manually entered hyperlinks.

For example, here s how I found this question (the link resolves to this question so don t bother clicking it!)

Some Twitter Users: @codinghorror, @spolsky, @jarrod_dixon and @blam4c. http://bit.ly/2phvZ1

In this case, it s easy:

var msg = "Some Twitter Users: @codinghorror, @spolsky, @jarrod_dixon and @blam4c. http://bit.ly/2phvZ1";

var html = Regex.Replace(msg, "(?<!w)(@(w+))", 
    "<a href="http://twitter.com/$2">$1</a>");

(this might need some tweaking, I d like to test it against a corpus, but it seems correct for the average Twitter message)

As for your more complicated cases (with HTML markup embedded in the tweets), I have no idea. Way too hard for me.

This regexp might work a bit better: /B@([w-]+)/gim

Here s a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/4/





相关问题
CSS working only in Firefox

I am trying to create a search text-field like on the Apple website. The HTML looks like this: <div class="frm-search"> <div> <input class="btn" type="image" src="http://www....

image changed but appears the same in browser

I m writing a php script to crop an image. The script overwrites the old image with the new one, but when I reload the page (which is supposed to pickup the new image) I still see the old one. ...

Firefox background image horizontal centering oddity

I am building some basic HTML code for a CMS. One of the page-related options in the CMS is "background image" and "stretch page width / height to background image width / height." so that with large ...

Separator line in ASP.NET

I d like to add a simple separator line in an aspx web form. Does anyone know how? It sounds easy enough, but still I can t manage to find how to do it.. 10x!

热门标签