English 中文(简体)
在卢塞内将分析器合并的最佳做法是什么?
原标题:What are the best practices for combining analyzers in Lucene?

我的情况是,在卢塞内使用标准Analyzer,将案文拼写成以下索引:

public void indexText(String suffix, boolean includeStopWords)  {        
    StandardAnalyzer analyzer = null;


    if (includeStopWords) {
        analyzer = new StandardAnalyzer(Version.LUCENE_30);
    }
    else {

        // Get Stop_Words to exclude them.
        Set<String> stopWords = (Set<String>) Stop_Word_Listener.getStopWords();      
        analyzer = new StandardAnalyzer(Version.LUCENE_30, stopWords);
    }

    try {

        // Index text.
        Directory index = new RAMDirectory();
        IndexWriter w = new IndexWriter(index, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);            
        this.addTextToIndex(w, this.getTextToIndex());
        w.close();

        // Read index.
        IndexReader ir = IndexReader.open(index);
        Text_TermVectorMapper ttvm = new Text_TermVectorMapper();

        int docId = 0;

        ir.getTermFreqVector(docId, PropertiesFile.getProperty(text), ttvm);

        // Set output.
        this.setWordFrequencies(ttvm.getWordFrequencies());
        w.close();
    }
    catch(Exception ex) {
        logger.error("Error message
", ex);
    }
}

private void addTextToIndex(IndexWriter w, String value) throws IOException {
    Document doc = new Document();
    doc.add(new Field(text), value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
    w.addDocument(doc);
}

这些工作做得很好,但我要把这项工作与利用斯沃加纳利泽来遏制。

这一类别还包含以下构造中所示两种变量:

public Text_Indexer(String textToIndex) {
    this.textToIndex = textToIndex;
    this.wordFrequencies = new HashMap<String, Integer>();
}

谁能告诉我,如何以上述法典实现这一目标?

增 编

Morgan先生

问题回答

Lucene provides the org.apache.lucene.analysis.Analyzer base class which can be used if you want to write your own Analyzer.
You can check out org.apache.lucene.analysis.standard.StandardAnalyzer class that extends Analyzer.

然后,在你Analyzer, 您通过使用过滤器,使用这些分析器,粉碎链标准Analyzer和SnowballAnalyzer。

TokenStream result = new StandardFilter(tokenStream);
result = new SnowballFilter(result, stopSet);

Then, in your existing code, you ll be able to construct IndexWriter with your own Analyzer implementation that chains Standard and Snowball filters.

Totally off-topic:
I suppose you ll eventually need to setup your custom way of handling requests. That is already implemented inside Solr.

第一,在SolrConfig.xml,如:

<searchComponent name="yourQueryComponent" class="org.apache.solr.handler.component.YourQueryComponent"/>

然后通过延长搜捕能力,写你的搜捕手,并在SolrConfig.xml加以界定:

  <requestHandler name="YourRequestHandlerName" class="org.apache.solr.handler.component.YourRequestHandler" default="true">
    <!-- default values for query parameters -->
        <lst name="defaults">
            <str name="echoParams">explicit</str>       
            <int name="rows">1000</int>
            <str name="fl">*</str>
            <str name="version">2.1</str>
        </lst>

        <arr name="components">
            <str>yourQueryComponent</str>
            <str>facet</str>
            <str>mlt</str>
            <str>highlight</str>            
            <str>stats</str>
            <str>debug</str>

        </arr>

  </requestHandler>

然后,当你向索莱尔发出尿道时,仅仅包括额外的参数qt=YourRequestHandlerName,这将导致你的请求书记员被用于这一请求。

More about SearchComponents.
More about RequestHandlers.

The SnowballAnalyzer provided by Lucene already uses the StandardTokenizer, StandardFilter, LowerCaseFilter, StopFilter, and SnowballFilter. So it sounds like it does exactly what you want (everything StandardAnalyzer does, plus the snowball stemming).

如果是这样的话,你可以很容易地把你希望的任何象征性的器具和托肯施特的器具结合起来。

In the end I rearranged the program code to call the SnowBallAnalyzer as an option. The output is then indexed via the StandardAnalyzer.

它运作迅速,但如果我能够只用一个分析器做一切,我就重新讨论我的法典。

Thanks to mbonaci and Avi.





相关问题
Lucene.NET in medium trust

How do I make Lucene.NET 2.3.2 run in a medium trust environment? GoDaddy doesn t like it the way it is.

Grails searchable plugin

In my Grails app, I m using the Searchable plugin for searching/indexing. I want to write a Compass/Lucene query that involves multiple domain classes. Within that query when I want to refer to the id ...

Search subset of objects using Compass/Lucene

I m using the searchable plugin for Grails (which provides an API for Compass, which is itself an API over Lucene). I have an Order class that I would like to search but, I don t want to search all ...

Lucene seems to be caching search results - why?

In my project we use Lucene 2.4.1 for fulltext search. This is a J2EE project, IndexSearcher is created once. In the background, the index is refreshed every couple of minutes (when the content ...

热门标签