English 中文(简体)
从网上的超文本
原标题:Get plain text from HTML in .NET

从超文本字典中找到简单案文的最佳途径是什么?

public string GetPlainText(string htmlString)
{
    // any .NET built in utility?
}

预 收

最佳回答

在我知道的情况下,没有建构,但根据你的要求,你可以使用定期表述来排除所有方面:

string htmlString = @"<p>I m HTML!</p>";
Regex.Replace(htmlString, @"<(.|
)*?>", "");
问题回答

您可以使用可占上风的超文本;

//using microsoft.mshtml
HTMLDocument htmldoc = new HTMLDocument();
IHTMLDocument2 htmldoc2 = (IHTMLDocument2)htmldoc;
htmldoc2.write(new object[] { "<p>Plateau <i>of<i> <b>Leng</b><hr /><b erp="arp">2 sugars please</b> <xxx>what? &amp; who?" });

string txt = htmldoc2.body.outerText;

Plateau of Leng 2 sugars please what? & who?

框架中没有内在解决办法。

If you need to parse HTML I made good experience using a library called HTML Agility Pack.
It parses an HTML file and provides access to it by DOM, similar to the XML classes.

个人方面,我发现,reg和 H的结合是最佳和最短的解决办法。

Return HttpUtility.HtmlDecode(
                Regex.Replace(HtmlString, "<(.|
)*?>", "")
                )

这消除了所有标签,然后将任何附加编码,如<条码>和复制;或&gt;

采用的方法建立了网。 但是,正如@rudi_visser所指出的,可以通过Regular expressions进行。

如果你只需要删除标的(即,转至&ampacirc;),那么你可以使用一个更为详尽的解决办法,如:here





相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...