English 中文(简体)
Java CSS 拖网
原标题:Java CSS Crawler

I m 寻找能够充斥CSS网页的网络拖网器。 我不需要任何其他疏远能力。

我试图通过Xapian、Nutch和Heritrix去做。 它们似乎都很复杂。 如果任何人有任何经验或建议,我会热心听。 还欢迎上述任何平台的无障碍辅导。

David

最佳回答
问题回答

我建议使用简单的HTTPClient和简单的regex。 你可将答复储存在你自己的档案、数据库或档案中(见Hitrix)。

它不使用重型加权拖网。 由于每个领域都很少,因此你可以安全地忽视在各领域的复杂ur。

Cheers !





相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签