English 中文(简体)
Different lucene search results using different search space size
原标题:

I have an application that uses lucene for searching. The search space are in the thousands. Searching against these thousands, I get only a few results, around 20 (which is ok and expected).

However, when I reduce my search space to just those 20 entries (i.e. I indexed only those 20 entries and disregard everything else...so that development would be easier), I get the same 20 results but in different order (and scoring).

I tried disabling the norm factors via Field#setOmitNorms(true), but I still get different results?

What could be causing the difference in the scoring?

Thanks

最佳回答

Please see the scoring documentation in Lucene s Similarity API. My bet is on the difference in idf between the two cases (both numDocs and docFreq are different). In order to know for sure, use the explain() function to debug the scores.

Edit: A code fragment for getting explanations:

TopDocs hits = searcher.search(query, searchFilter, max);
ScoreDoc[] scoreDocs = hits.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
  String explanation = searcher.explain(query, scoreDoc.doc).toString();
  Log.debug(explanation);
}
问题回答

Scoring depends on all the documents in the index:

In general, the idea behind the Vector Space Model (VSM) is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query.

Source: Apache Lucene - Scoring





相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签