English 中文(简体)
How would I search for blank facets in a multi valued facet field and at the same time in Solr?
原标题:

I have an application where users can pick car parts. They pick their vehicle and then pick vehicle attributes as facets. After they select their vehicle, they can pick facets like engine size, for example, to narrow down the list of results. The problem was, not all documents have an engine size (it s an empty value in Solr), as it doesn t matter for all parts. For example, an engine size rarely matters for an air filter. So even if a user picked 3.5L for their engine size, I still wanted to show the air filters on the screen as a possible part the user could pick. I did some searching and the following facet query works perfectly:

enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *]) 

This query would match either 3.5 or would match records where there was no value for the engine size field (no value meant it didn t matter, and it fit the car). Perfect...

THE PROBLEM: I recently made the vehicle attribute fields multivalued fields, so I could store attributes for each part as a list. I then applied faceting to it, and it worked fine. However, the problem came up when I applied the query previously mentioned above. While selecting the enginesize facet narrowed down the number of documents displayed to only documents that have that engine size, records (I also use the word record to mean document) that had empty values (i.e. "") for enginesize were not appearing. The same query above does not work for multivalued facets the same way it did when enginesize was a single valued field.

Example:

 <doc> 
  <str name="part">engine mount</str>
  <arr name="enginesize">
   <str/>
   <str/>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
  </arr>
 <doc>

<doc> 
  <str name="part">engine bolt</str>
  <arr name="enginesize">
   <str>6</str>
   <str>6</str>
   <str>6</str>
   <str>6</str>
   <str>6</str>
  </arr>
 <doc>

 <doc> 
  <str name="part">air filter</str>
  <arr name="enginesize">
   <str/>
   <str/>
   <str></str>
   <str></str>
   <str></str>
   <str></str>
   <str></str>
  </arr>
 <doc>

What I am looking for is a query that will pull back documents 1 and 3 above when I do a facet search for the engine size for 3.5. The first document (the engine mount) matches, because it contains the value in one of the multivalued fields "enginesize" that I am looking for (contains 3.5 in one of the fields). However, the third document for the air filter doesn t get returned because of the empty <str> values. I do not want to return the second document at all because it doesn t match the facet value

I basically want a query that will match empty string values for a given facet and also match the actual value, so I get both documents returned.

Does someone have a query that would return document 1 and document 3 (the engine bracket and the air filter), but not the engine bolt document?

I tried the following without success (including the one at the very top of this question):

// returns everything
enginesize:"3.5"    OR  (enginesize:[* TO *] )
// only returns document 1
enginesize:"3.5"    OR  (enginesize:["" TO ""] AND -enginesize:"3.5")
// only returns document 1
enginesize:"3.5" OR (enginesize:"")

I imported the data above using a CSV file, I set the field keepEmpty=true. I tried instead manually inserting a space into the field when I generated the CSV file (which would give you <str> </str>, instead of the previous , and then retried the queries. Doing that, I got the following results:

// returns document 1
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
// returns all documents
enginesize:"3.5"    OR  (enginesize:["" TO ""] AND -enginesize:"3.5")
// returns all documents
enginesize:"3.5" OR (enginesize:"")

Does anyone have a query that would work for either situation, whether I have a space as the blank value or simply no value at all?

最佳回答

How about changing how you index, instead of how you query?

Instead of trying to index "engine size doesn t matter" as an empty record, index it as "ANY".

Then your query simply becomes enginesize:"3.5" OR (enginesize:ANY)

问题回答

i ve just been playing with this and found a hint that seems to do the trick for me. translated to your query it should be:

enginesize:"3.5" OR (-enginesize:["" TO *])

hth,

andi


update: after some more testing i don t think this works reliably — for some indexes it had to be the other way round and without the minus sign, i.e. enginesize:[* TO ""]. this might depend on the index type, if it s multi-valued or even on the actual values.

in any case it seems too much of a hack. i ll probably resolve to substituting the empty value with a special marker...

I had the same problem, but solved it in https://stackoverflow.com/a/35633038/13365:

enginesize:"3.5" OR (*:* NOT enginesize:["" TO *])

The -enginesize solution didn t work for me.





相关问题
solr problem to get the field names

Ive got a problem. In each document I ve got fields: threads.id and posts.id. I want to get the field name value for them so i can get data from the database. Between the lines beneath i have marked ...

Which is the better client for Solr + PHP?

I have two options http://www.php.net/manual/en/book.solr.php http://code.google.com/p/solr-php-client/ I read it somewhere that that 2) use JSON as output types whereas 1) use XML doc. Isn t ...

Geronimo vs Glassfish

For a production environment, is Apache Geronimo better for applications that uses ActiveMQ, Derby, Solr?

Sort by date in Solr/Lucene performance problems

We have set up an Solr index containing 36 million documents (~1K-2K each) and we try to query a maximum of 100 documents matching a single simple keyword. This works pretty fast as we had hoped for. ...

SOLR - delta import not with last_modified

I saw only ways using delta import with last_modified. Is there some other ways to do delta_imports withut using timestamps? For example, if i have unique key(integer), can i tell SOLR to index only ...

SOLR How to return only limited matched content

ok guys, say in my Schema I have 4 fields: <field name="SiteIdentifier" type="string" indexed="true" stored="true" required="true"/> <field name="Title" type="text" indexed="true" stored="...

Solr - character substitution

I have Solr with indexed database. In my database all data is in Latvian. The problem is, I need to be able to search word Riga as if it is word Rīga. Of course, i can define synonym - Rīga = Riga, ...

热门标签