Question

www.un.org/Depts/DGACM/index_spanish.htm 我通过使用一名习俗收集者(与女方有比特人)。根据我的需求,探寻和拿狗的速度非常快,但是,在实际从磁盘上取文件时,事情非常缓慢。是否有办法优化Lucene以加快文件收集工作?

<<>详细>:> Im从事瓦基百科全书处理工作,我把每一句作为单独文件。当我寻求“电脑”时,我就得到所有含有计算机术语的判决。目前,搜索该卷和收集所有文件的工作在分二部分进行,但头1 000份文件大约需要20秒。对所有文件进行分类的时间按比例增加(即每1 000份记录中再有20份)。

之后的搜索和文件收集需要的时间要少得多(尽管我不知道谁做ach子、顾问或卢塞恩?),但我却在寻找许多不同术语,我不想依赖ach,第一次搜索的表现对我至关重要。

我期待能够改进文件记录工作的建议/trick(如果可以的话)。提前感谢!

。

我使用Lucene 3.0.0,但我使用Jython驾驶卢塞语班。换言之,我指在搜索过程中检索的每台狗的以下Jython语系的理论方法:

class DocumentFetcher():  
  def __init__(self, index_name):  
    self._directory = FSDirectory.open(java.io.File(index_name))  
    self._index_reader = IndexReader.open(self._directory, True)  
  def get_doc(self, doc_id):  
    return self._index_reader.document(doc_id)

我的指数是50M文件。

Answer 1

也许,你在文件中储存了大量信息。将所储存的田地削减到你能够做到的水平。

第二,在恢复田地的同时,只选择了你们需要的领域。你们可以采用索引检索方法,只具体说明少数储存的田地。

public abstract Document document(int n, FieldSelector fieldSelector)

这样,你不会装上没有使用的田地。

你们可以利用以下代码样本。

FieldSelector idFieldSelector = 
new SetBasedFieldSelector(Collections.singleton("idFieldName"), Collections.emptySet());
for (int i: resultDocIDs) {
String id = reader.document(i, idFieldSelector).get("idFieldName");
}

Answer 2

Scaling Lucene and Solr discusses many ways to improve Lucene performance. As you are working on Lucene search within Wikipedia, you may be interested in Rainman s Lucene Search of Wikipedia. He mostly discusses algorithms and less performance, but this may still be relevant.

友情链接