English 中文(简体)
- 读HTML吗?
原标题:c# - reading HTML?

我正在开发 C# 中的程序, 我需要一些帮助 。 我试图创建一个阵列或项目列表, 在一个特定网站上显示。 我试图做的是读取锁定文本, 并且它是 < code>href 。 例如, 这是 HTML :

<div class="menu-1">
    <div class="items">
        <div class="minor">
            <ul>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-1"
                    href="/?item=1">Item 1</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-2"
                    href="/?item=2">Item 2</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-3"
                    href="/?item=3">Item 3</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-4"
                    href="/?item=4">Item 4</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-5"
                    href="/?item=5">Item 5</a>
                </li>
            </ul>
        </div>
    </div>
</div>

所以从那个HTML里 我想读一下这个:

string[,] array = {{"Item 1", "/?item=1"}, {"Item 2", "/?item=2"},
    {"Item 3", "/?item=3"}, {"Item 4", "/?item=4"}, {"Item 5", "/?item=5"}};

HTML是我写的一个例子, 实际的网站看起来不是这样的。

最佳回答

正如其他人所说, HtmlAglityPack 是 html 剖析的最佳工具, 也一定要从 HtmlAglityPack 站点下载 HAP 探索器, 用它测试您的选择, 无论如何, 此选择节点命令会获得所有有 ID 且以菜单项目开头的锁定 :

  HtmlDocument doc = new HtmlDocument();
  doc.Load(htmlFile);
  var myNodes = doc.DocumentNode.SelectNodes("//a[starts-with(@id, menu-item- )]");
  foreach (HtmlNode node in myNodes)
  {
    Console.WriteLine(node.Id);

  }
问题回答

如果 HTML 是有效的 XML, 您可以使用 XmlDocument 类来装入它, 然后用 XPath 访问您想要的片段, 或者您可以使用和 Adriano 建议的 < code> XmlReader (多一点工作) 。

如果 HTML 不合法的 XML I d 建议使用一些已有的 HTML 解析器, 例如 < a href=> "http://archive. msdn. microsoft.com/SgmlReader" rel= "nofollow" > this - 对我们有效 。

您也可以使用 < a href=> http://htmlagilitypack.codeplex.com/" rel="nofollow" >HtmlAgility pack

我认为这个例子简单到可以使用常规表达式, 如 :

string strRegex = @"<a.*title=""([^""]*)"".*href=""([^""]*)""";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);

string strTargetString = ...;

foreach (Match myMatch in myRegex.Matches(strTargetString))
{
  if (myMatch.Success)
  {
    // Use the groups matched
  }
}




相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签