English 中文(简体)
Lucene standard analyzer 分级
原标题:Lucene standard analyzer split on period

How do I make Lucene s Standard Analyzer tokenize on the . char? For eg., on querying for "B" I need it to return the B in "A.B.C" as the result. I need to treat numbers the way the standard analyzer treats it, and hence the Simple analyzer is not sufficient. It would be perfect if I could just specify to the standard Analyzer to tokenize on the . char too. If I had to write my own tokenizer with just this small extension, how would I go about it? Thanks, Nacha

最佳回答

I believe the easiest is to create your own Analyzer. which will get tokens from StandardAnalyzer as input and further split tokens on dots, keeping dotless tokens intact.

The package summary gives some advice on how to do this. This blog post seems very relevant, but uses an old version of Lucene, so you will probably need to tweak it. Also, see the Lucene FAQ.

问题回答

暂无回答




相关问题
Split Strings and arrange db to display products in PHP

I m new in php. Could you please help me to find the way to properly arrange following task: Table "Products" id - details 1 - 1-30,2-134:6:0;;2-7:55:0;;1-2,2-8:25:0 - where this string can be ...

Lucene Query WITHOUT Operators

I am trying to use Lucene to search for names in a database. However, some of the names contain words like "NOT" and "OR" and even "-" minus symbols. I still want the different tokens inside the names ...

Google-like search query tokenization & string splitting

I m looking to tokenize a search query similar to how Google does it. For instance, if I have the following search query: the quick "brown fox" jumps over the "lazy dog" I would like to have a ...

Recursive woes - reducing an input string

I m working on a portion of code that is essentially trying to reduce a list of strings down to a single string recursively. I have an internal database built up of matching string arrays of varying ...

Tokenize from a textfile reading into an array in C

How do you tokenize when you read from a file in C? textfile: PES 2009;Konami;DVD 3;500.25; 6 Assasins Creed;Ubisoft;DVD;598.25; 3 Inferno;EA;DVD 2;650.25; 7 char *tokenPtr; fileT = fopen("DATA2....

tokenize a string keeping delimiters in Python

Is there any equivalent to str.split in Python that also returns the delimiters? I need to preserve the whitespace layout for my output after processing some of the tokens. Example: >>> s="...

C tokenize polynomial coefficients

I m trying to put the coefficients of polynomials from a char array into an int array I have this: char string[] = "-4x^0 + x^1 + 4x^3 - 3x^4"; and can tokenize it by the space into -4x^0 x^1 4x^3 ...

Approaching Text Parsing in Scala

I m making an application that will parse commands in Scala. An example of a command would be: todo get milk for friday So the plan is to have a pretty smart parser break the line apart and ...

热门标签