English 中文(简体)
如何在Lucene免费文本查询中发现一个日期?
原标题:How to detect a date in a Lucene free text search query?
  • 时间:2011-06-09 14:14:51
  •  标签:
  • lucene

我们重新利用Lucene开发一个免费文本检索箱,以收集向用户提供的数据,如电子邮件In Box。 我们愿让该箱处理日期,例如第5/1/2011号。 为了使事情更加容易,我们把目前版本的特征限制在两个日期格式上:

mm/dd/yy
mm/dd/yyyy

对我们的原型来说,我们 ha住了问答分析过程,试图先处理问题,看看这两个日期模式。 大约两年前,我们就在Lucene 2.4。 奇怪的是,是否在卢塞内有任何工具来接受“日期”,并将“TkenStream”带回任何确定的日期。 通过对Lucene 2.9的javadocs,我发现:

org.apache.lucene.analysis.sinks.DateRecognizerSinkFilter

看来我需要做些什么,但实施了一个SinkFilter,这一概念似乎在卢塞尼·维基中得到了记录。 是否有任何人在之前使用过这种过滤器,如果是,使用该过滤器的最有效方式是什么?

最佳回答

There is a bit of sample code (which is, admittedly, over-complicated) in the documentation for TeeSinkTokenFilter. Note that the way the DateRecognizerSinkFilter is designed, it does not store the actual date; it just detects that a token is a date that conforms to the specified format. What I would try is to re-implement the DateRecognizerSinkFilter class to take an array of DateFormat instances, create a new Attribute class called DateAttribute (or some-such) and use the date recognizer subclass to set the parsed date into the DateAttribute if one of its formats matches. That way, you can always test whether you have a valid date by interrogating the DateAttribute, and localize the date formats to one class. Another advantage is that you won t have to handle multiple sinks, thereby simplifying the code from the linked example.

问题回答

暂无回答




相关问题
Lucene.NET in medium trust

How do I make Lucene.NET 2.3.2 run in a medium trust environment? GoDaddy doesn t like it the way it is.

Grails searchable plugin

In my Grails app, I m using the Searchable plugin for searching/indexing. I want to write a Compass/Lucene query that involves multiple domain classes. Within that query when I want to refer to the id ...

Search subset of objects using Compass/Lucene

I m using the searchable plugin for Grails (which provides an API for Compass, which is itself an API over Lucene). I have an Order class that I would like to search but, I don t want to search all ...

Lucene seems to be caching search results - why?

In my project we use Lucene 2.4.1 for fulltext search. This is a J2EE project, IndexSearcher is created once. In the background, the index is refreshed every couple of minutes (when the content ...

热门标签