English 中文(简体)
Java等效于PHP简单HTML DOM解析器
原标题:Java equivalent to PHP Simple HTML DOM Parser
  • 时间:2011-05-30 13:20:08
  •  标签:
  • java
  • html
  • dom

由于我必须使用多线程,而我无法在PHP中雄辩地解决这个问题,我想用Java进行编程,不幸的是,我找不到一个库,它可以让我像在PHP Simple HTML DOM Parser中那样稳健、快速、轻松地解析HTML DOM。你知道Java中同样容易使用的替代方案吗?

最佳回答

我从简单HTML DOM解析器转到JSoup,我对此很满意。

问题回答

我可以看出,我们面临两个挑战:

  • HTML的解析可能不是格式良好的XHTML,它很容易解析。我推荐tagsoup库,它可以读取难看的HTML并生成格式良好的StaX流,然后可以在其他地方使用。

  • HTML文档的DOM表示的构建和处理。正如您可能知道的那样,在JDK中有全面的XML DOM实现(org.w3c.DOM.*)。但我想这不是您一直在寻找的API类型。DOM4J或更早的JDOM可以包装JDK文档,您可以享受易于使用的API?

我已经成功地使用TagSoup作为SAX解析器来填充DOM4J文档,然后用XPath查询这些文档。我花了一段时间才想出咒语——(Scala,但我相信你可以转换):

parserFactory = new org.ccil.cowan.tagsoup.jaxp.SAXFactoryImpl
val reader = new SAXReader(parserFactory.newSAXParser.getXMLReader)
val doc = reader.read(new InputSource(new StringReader(page)))




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签