English 中文(简体)
经常更新文件——效率/可扩展性
原标题:Frequent Updates to Solr Documents - Efficiency/Scalability concerns

I have a Solr index with document fields something like:

id, body_text, date, num_upvotes, num_downvotes

In my application, a document is created with some integer id and some body_text (500 chars max). The date is set to the time of input, and num_upvotes and num_downvotes begin at 0.

我的申请使用户有能力夸张和淡化上述内容,因此,我想在索勒而不是仅仅在行文中追踪这一情况,是我想能够把发价和发价列入我的<<>search/code>。

这是一个问题,因为你可以简单地更新文件(即增加文件号),而且你必须替换整个文件,因为考虑到需要打败我的行,以再次收集所有相关数据,该文件很可能效率不高。

I realize the solution may require a different layout of data, or possibly multiple indexes (although I don t know if you can query/score across solr cores).

Is anyone able to offer any recommendations on how to tackle this?

最佳回答

我在类似问题上使用的一个解决办法是,利用自上次更新以来修改的文件,每十分钟更新数据库中的信息,并更新SOLR/Inserts。

Also every night, when I don t have much traffic I do index optimize. After each import I set up some warm-up queries in SOLR config.

In my SOLR index I have around 1.5 milion documents,each document has 24 fields, and around 2000 characters in the entire document. I update the index every 10 minutes around 500 documents ( without optimizing the index ), and I do around 50 warmup queries comprised of most common facets, most used filter queries and free text search.

I don t get negative impact on performance. ( at least it is not visible ) - my queries run average in 0.1 seconds. ( before doing update at every 10 minutes average queries were 0.09 seconds)

www.un.org/Depts/DGACM/index_spanish.htm LATER EDIT:

在这次更新期间,我没有遇到任何问题。 我从数据库中拿到这些文件,并用《生物浓缩公约》的关键内容插入这些文件。 如果在《保护臭氧层公约》中存在该文件,则予以替换(这是我通过更新的含义)。

更新《保护生命法》的时间从3分钟以上。 实际上,每次更新一次,我就花了10分钟的时间。 因此,我开始更新该指数,我等待完成,然后我再等10分钟。

我看不到夜里的表现,但对我来说,这并不重要,因为在用户访问高峰期间,我想有新的数据信息。

问题回答

Join的特稿将在此帮助您。 然后,你可以在一份单独文件中储存上下级的选票。

坏消息是,你需要等到索勒4号楼,除非你感到安慰,能够建造一座楼梯。

If you are only going to be updating the up/down votes. Instead of going back to the database, just use the appropriate Solr Client for your application and pull the document from the index, set the up/down values as needed and then reinsert the document back into the index.

在《保护臭氧层公约》中,你的问题没有解决办法。 你有一个数据库问题,你正试图用搜索引擎解决。

处理这一问题的最佳方式是保留<条码>redis。 记录、id、SOLR和上下表决计数的数据库。 然后,你可以把两个来源的数据合并起来,然后再显示。





相关问题
Acronyms with Sphinx search engine

how can i index acronyms like m.i.a. ? when i search for mia , i get results for mia and not m.i.a. . when i search for m.i.a. , i get nothing at all. edit: solution looks roughly like: ...

Querying multiple index in django-sphinx

The django-sphinx documentation shows that django-sphinx layer also supports some basic querying over multiple indexes. http://github.com/dcramer/django-sphinx/blob/master/README.rst from ...

Adding Search to Ruby on Rails - Easy Question

I am trying to figure out how to add search to my rails application. I am brand new so go slow. I have created a blog and done quite a bit of customizing including adding some AJAX, pretty proud of ...

Searching and ranking short phrases (e.g. movie titles)

I m trying to improve our search capabilities for short phrases (in our case movie titles) and am currently looking at SQL Server 2008 Full Text Search, which provides some of the functionality we ...

Will Full text search consider indexes?

Ok I have a full text search index created on my JobsToDo table, but what I m concerned about is if this is rendering my other indexes on the table useless. I have a normal nonclustered index on the ...

Lucene.NET on shared hosting

I m trying to get Lucene.NET to work on a shared hosting environment. Mascix over on codeproject outlines here how he got this to work on godaddy. I m attempting this on isqsolutions. Both ...

Hibernate Search or Compass

I can t seem to find any recent talk on the choice. Back in 06 there was criticism on Hibernate Search as being incomplete and not being ready to compete with Compass, is it now? Has anyone used both ...

热门标签