English 中文(简体)
护卫
原标题:strip xml and html from a string

I have a string from which I need to strip all HTML 以及XML. I am not really good with regular expressions. For HTML I found some really useful code:

snippet = Regex.Replace(snippet, "<.*?>", "");

目前,我正为“XML”努力:

while (snippet.IndexOf("<xml>") != -1)
            {
                int startLoc = snippet.IndexOf("<xml>");
                int endLoc = snippet.IndexOf("</xml>");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 6);
            }
            while (snippet.IndexOf("<style>") != -1)
            {
                int startLoc = snippet.IndexOf("<style>");
                int endLoc = snippet.IndexOf("</style>");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 8);
            }
            // only required for chrome 以及IE
            // removes - <object  classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id="ieooui">
            while (snippet.IndexOf("<object") != -1)
            {
                int startLoc = snippet.IndexOf("<object");
                int endLoc = snippet.IndexOf("id="ieooui">");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 12);
            }
            // removes - <object id="ieooui" classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D">
            while (snippet.IndexOf("<object") != -1)
            {
                int startLoc = snippet.IndexOf("<object");
                int endLoc = snippet.IndexOf("classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D"");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 52);
            }

这种状况非常不利。 大约1人可以建议我定期表示Xml,特别是:

<object id="ieooui" classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D">

以及

<object  classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id="ieooui">

感谢一吨。

问题回答

In general you cannot parse HTML by regexp. Well, technically you can but as you say it will be "untidy". That task is usually made by using SAX parser. Or even without it by using HTML/XML tokenizer. Like this one http://www.codeproject.com/KB/recipes/HTML_XML_Scanner.aspx





相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签