Question

我想利用Lucene.net从各种来源(例如地方档案系统和数据库)对数据进行索引。然而,我要把两个来源的数据(基于一个共同领域,如一个国际发展领域)联系起来,并向用户展示综合信息。就我所知,我有三种选择。在对每个来源进行索引编制之后:

Use Lucene.net to combine the indexes in a search query into a single result set
Create some custom code to correlate results retrospectively; or
Store separate result sets in a database (in my case, it won t be the same database as the source). Then create a new index based on a query that joins the data

备选案文1是我喜欢做的事,但我不敢肯定,由于以下几个原因,这一点与卢塞恩有多么可行:

Lucene isn t a relational database, is this attempting something that Lucene is not really designed to do?
Can combining indexes result in a noticeable performance hit?

选择2的唯一理由是,如果我相信我能够创建比选择1更为有效的算法。按照这一逻辑,我不得不问,我是否应该完全使用卢塞恩来校正数据。

导致我选择 3. 我高兴的是,它将发挥作用,但似乎是一种妥协:

Data will be stored in a database as well as Lucene (as well as the original source)
By introducing an extra step, it ll take longer to complete the process. I m not sure how this will affect the user experience

任何建议?

Answer 1

Yes, you can, but you need to stop thinking relationally and start thinking in terms of documents rather than rows. Or, option 3 is the right approach. What you want to do is to create a single document holding:

a) whatever I wanted to search on -- analyized fields in lucene terms
b) pointers to the full, extant records -- basically the ID number or file location
c) if possible, enough stuff to show search results without having to reach out to the file system or the database -- stored fields in lucene parlance.

在业绩方面,有太多的间接费用或超负荷。添加物品以指数化,并不是说大片业绩受到打击,而列ene本身也非常快。如果需要,我将以合理、集中的方式加以充实,然后变为业绩。

友情链接