Question

www.un.org/Depts/DGACM/index_spanish.htm 右边一米的工程现在是从我的问询和我的指数田中除去信、数字和白色空间以外的东西。这种做法产生了理想的行为,但在很大程度上是一个工作问题,而不是真正的解决办法。我仍然想理解,如果任何人有答案,索勒为什么会做什么。 <>ENDIT 3

我有一份名为“TT-14B”的文件,索引为Solr 1.4(通过Django/Haystack)。当我询问“t-1”或“tt 14”或“tt 14b”的“coment_autoEdit): Oh, 并将违约的操作者从 help到OR。

谁能帮助我理解为什么不这样做? 感谢。

......

成果

QUERY  | WORKS
=======|======
tt     | yes
tt-    | yes
tt-1   | yes
tt-14  | no
tt-14b | no
tt 14  | yes
tt 14b | yes

<EDIT 2

Got some more comparably weird 成果, might help debug the problem. In this case the test document was "abc def".

QUERY   | WORKS
========|======
abc     | yes
abc d   | yes
abc de  | no
abc def | no

显然,萨米模式,但我不理解造成这种情况的原因。

<>ENDIT 2

缩略语

<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="0" preserveOriginal="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="0" preserveOriginal="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
  </analyzer>
</fieldType>

schema.xml (full)

<?xml version="1.0" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<schema name="default" version="1.1">
  <types>
    <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>

    <!-- Numeric field types that manipulate the value into
         a string value that isn t human-readable in its internal form,
         but with a lexicographic ordering the same as the numeric ordering,
         so that range queries work correctly. -->
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>

    <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="0" preserveOriginal="1" catenateWords="1" catenateNumbers="1" catenateAll="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="0" preserveOriginal="1" catenateWords="0" catenateNumbers="0" catenateAll="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="ngram" class="solr.TextField" >
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="0" preserveOriginal="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="0" preserveOriginal="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
      </analyzer>
    </fieldType>
  </types>

  <fields>   
    <!-- general -->
    <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
    <field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="django_id" type="string" indexed="true" stored="true" multiValued="false" />

    <dynamicField name="*_i"  type="sint"    indexed="true"  stored="true"/>
    <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
    <dynamicField name="*_l"  type="slong"   indexed="true"  stored="true"/>
    <dynamicField name="*_t"  type="text"    indexed="true"  stored="true"/>
    <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
    <dynamicField name="*_f"  type="sfloat"  indexed="true"  stored="true"/>
    <dynamicField name="*_d"  type="sdouble" indexed="true"  stored="true"/>
    <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>


    <field name="modelname_exact" type="string" indexed="true" stored="true" multiValued="false" />

    <field name="modelname" type="text" indexed="true" stored="true" multiValued="false" />

    <field name="name" type="text" indexed="true" stored="true" multiValued="false" />

    <field name="text" type="text" indexed="true" stored="true" multiValued="false" />

    <field name="name_exact" type="string" indexed="true" stored="true" multiValued="false" />

    <field name="content_auto" type="edge_ngram" indexed="true" stored="true" multiValued="true" />

  </fields>

  <!-- field to use to determine and enforce document uniqueness. -->
  <uniqueKey>id</uniqueKey>

  <!-- field for the QueryParser to use when an explicit fieldname is absent -->
  <defaultSearchField>text</defaultSearchField>

  <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
  <solrQueryParser defaultOperator="AND" />
</schema>

Answer 1

对所有案件都进行 /admin。

Is there a reason, why positionIncrementGap="1"is set to 1?

tt-14b and tt 14b are handled different, because of the whitspace tokenizer. That means: tt-14b is one term, as long as the WordDelimiterFilterFactorydoesn t fired, while tt 14b are 2 terms from the beginning. The positionIncrementGap gives you the possibility to see different terms as one phrase, even if there are not neighbors, but on the next "n" position. So try to rise the positionIncrementGap.

Btw: The first i notice on your schema.xml are the missing "EdgeNGramFilterFactory" at query time. Which should be okay. But there are also understandable reasons, why "same filters on query- and index-time" are handled as best practices. This depends on every special situation, but activating this filter on query-time would be a try.

Answer 2

稍晚才显示这一点——但如上所述,如果白天花板通过标准Analyzer的话,白天花花花旗的耳光。我也看到了这种困难。

Solr NGramTokenizerFactory and PatternReplaceCharFilterFactory - Analyzer results inconsistent with Query Results

解决办法很可能是使用关键词Analyzer——它不应分裂任何东西。

我在谈到上述联系时,与您的“EDIT 3”相类似。

The particularly frustrating thing about the Solr analyzer is that it shows everything is fine and behaving as expected -- which really confused me.

Good luck!

友情链接