English 中文(简体)
Solr - character substitution
原标题:
  • 时间:2009-11-19 08:07:06
  •  标签:
  • solr
  • synonym

I have Solr with indexed database. In my database all data is in Latvian. The problem is, I need to be able to search word Riga as if it is word Rīga. Of course, i can define synonym - Rīga = Riga, but can i just define, that letter ī is letter i? I read something about solr.ISOLatin1AccentFilterFactory, but as far as i understood, this is not for UTF-8 encoding, right? Advices?

最佳回答

Used PatternReplaceFilterFactory with index and query. Seems to be working right.

问题回答

ISOLatin1AccentFilterFactory is exactly what you are looking for... as long as the accent EXISTS in the latin-1 character set (lower 7 bits of UTF-8 are identical to latin-1). The ī that you mentioned doesn t appear to exist in ISO-8859-1 so ISOLatin1AccentFilterFactory won t work in this SPECIFIC case. I would still recommend that you use ISOLatin1AccentFilterFactory in addition to any exceptions that you take care of using PatternReplaceFilterFactory as there probably are some Latvian characters that it will help (assuming, I don t have experience with Latvian)

FYI, I did actually try the against my Solr setup with ISOLatin1AccentFilterFactory and it didn t help this case.

Look at ICUTokenizerFactory which provides Unicode character normalization. Extremely useful and very easy.

http://lucene.apache.org/solr/api/org/apache/solr/analysis/ICUTokenizerFactory.html

http://site.icu-project.org/





相关问题
solr problem to get the field names

Ive got a problem. In each document I ve got fields: threads.id and posts.id. I want to get the field name value for them so i can get data from the database. Between the lines beneath i have marked ...

Which is the better client for Solr + PHP?

I have two options http://www.php.net/manual/en/book.solr.php http://code.google.com/p/solr-php-client/ I read it somewhere that that 2) use JSON as output types whereas 1) use XML doc. Isn t ...

Geronimo vs Glassfish

For a production environment, is Apache Geronimo better for applications that uses ActiveMQ, Derby, Solr?

Sort by date in Solr/Lucene performance problems

We have set up an Solr index containing 36 million documents (~1K-2K each) and we try to query a maximum of 100 documents matching a single simple keyword. This works pretty fast as we had hoped for. ...

SOLR - delta import not with last_modified

I saw only ways using delta import with last_modified. Is there some other ways to do delta_imports withut using timestamps? For example, if i have unique key(integer), can i tell SOLR to index only ...

SOLR How to return only limited matched content

ok guys, say in my Schema I have 4 fields: <field name="SiteIdentifier" type="string" indexed="true" stored="true" required="true"/> <field name="Title" type="text" indexed="true" stored="...

Solr - character substitution

I have Solr with indexed database. In my database all data is in Latvian. The problem is, I need to be able to search word Riga as if it is word Rīga. Of course, i can define synonym - Rīga = Riga, ...

热门标签