English 中文(简体)
收集所有点击,在卢塞内进行搜索/优化
原标题:Collect all hits for a search in Lucene / Optimization

www.un.org/Depts/DGACM/index_spanish.htm 我通过使用一名习俗收集者(与女方有比特人)。 根据我的需求,探寻和拿狗的速度非常快,但是,在实际从磁盘上取文件时,事情非常缓慢。 是否有办法优化Lucene以加快文件收集工作?

<<>详细>:> Im从事瓦基百科全书处理工作,我把每一句作为单独文件。 当我寻求“电脑”时,我就得到所有含有计算机术语的判决。 目前,搜索该卷和收集所有文件的工作在分二部分进行,但头1 000份文件大约需要20秒。 对所有文件进行分类的时间按比例增加(即每1 000份记录中再有20份)。

之后的搜索和文件收集需要的时间要少得多(尽管我不知道谁做ach子、顾问或卢塞恩?),但我却在寻找许多不同术语,我不想依赖ach,第一次搜索的表现对我至关重要。

我期待能够改进文件记录工作的建议/trick(如果可以的话)。 提前感谢!

我使用Lucene 3.0.0,但我使用Jython驾驶卢塞语班。 换言之,我指在搜索过程中检索的每台狗的以下Jython语系的理论方法:

class DocumentFetcher():  
  def __init__(self, index_name):  
    self._directory = FSDirectory.open(java.io.File(index_name))  
    self._index_reader = IndexReader.open(self._directory, True)  
  def get_doc(self, doc_id):  
    return self._index_reader.document(doc_id)  

我的指数是50M文件。

问题回答

也许,你在文件中储存了大量信息。 将所储存的田地削减到你能够做到的水平。

第二,在恢复田地的同时,只选择了你们需要的领域。 你们可以采用索引检索方法,只具体说明少数储存的田地。

public abstract Document document(int n, FieldSelector fieldSelector)

这样,你不会装上没有使用的田地。

你们可以利用以下代码样本。

FieldSelector idFieldSelector = 
new SetBasedFieldSelector(Collections.singleton("idFieldName"), Collections.emptySet());
for (int i: resultDocIDs) {
String id = reader.document(i, idFieldSelector).get("idFieldName");
}

Scaling Lucene and Solr discusses many ways to improve Lucene performance. As you are working on Lucene search within Wikipedia, you may be interested in Rainman s Lucene Search of Wikipedia. He mostly discusses algorithms and less performance, but this may still be relevant.





相关问题
Optimizing a LAN server for a game

I m the network programmer on a school game project. We want to have up to 16 players at once on a LAN. I am using the Server-Client model and am creating a new thread per client that joins. ...

SQL Table Size And Query Performance

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema. Items - ItemID - ...

Most optimized way to store crawler states?

I m currently writing a web crawler (using the python framework scrapy). Recently I had to implement a pause/resume system. The solution I implemented is of the simplest kind and, basically, stores ...

Do bitwise operations distribute over addition?

I m looking at an algorithm I m trying to optimize, and it s basically a lot of bit twiddling, followed by some additions in a tight feedback. If I could use carry-save addition for the adders, it ...

Improve INSERT-per-second performance of SQLite

Optimizing SQLite is tricky. Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second! Background: We are using SQLite as part of a desktop ...

Profiling Vim startup time

I’ve got a lot of plugins enabled when using Vim – I have collected plugins over the years. I’m a bit fed up with how long Vim takes to start now, so I’d like to profile its startup and see which of ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签