English 中文(简体)
意向
原标题:Nutch API advice

I m working on a project where I need a mature crawler to do some work, and I m evaluating Nutch for this purpose. My current needs are relatively straightforward: I need a crawler that is able to save the data to disk and I need it to be able to recrawl only the updated resources of a site and skip the parts that are already crawled. Does anyone have any experience working with the Nutch code directly in Java, not via the command line. I would like to start simple: create a crawler (or similar), minimally configure it and start it, nothing fancy. Is there some example for this, or some resource I should be looking at? I m going over the Nutch documentation, but most of it is about command line, search and other stuff. How usable is the Nutch crawling module without the need to index and search? Any help is appreciated. Thanks.

最佳回答

Nutch is very different than what you have ever practiced most probably. Because it is something like a framework it not only has front for query & search, athough solr seems more powerfull than the native Nutch search front end. It also has the crawling part and the indexing (into a Lucene indexe).

如果你想要将拖网用于除搜索以外的其他目的,那么你将需要开发自己的节目,并熟悉Hadoop和地图绘制方案。

你们不相信你们想要做什么,但却不喜欢做什么。 解决办法

问题回答

暂无回答




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签