English 中文(简体)
搜索引擎文字—— reg、多卷、线号
原标题:search engine script - regex, multiple files, line numbers

I m looking for a search engine script, or search engine that can:

  1. Search lots of large text files, specifically hundreds of full text novels.
  2. Use regex to return words and possible variations.
  3. Give the location in the file of all the matches, such as line number, or word count.
  4. Ideally with javascript or php, as they re the only languages I m adept in, and I ll probably have to manipulate the results. But I m sure I can bite the bullet and learn the syntax to whatever language needed.
  5. Filter a search result array of words against a dictionary to find proper nouns (This may not include the search engine)

背景和具体事项(长期和仅具有一定重要性的类型):

我有一位朋友,在19世纪新鲜婚(思莎士莎歌剧团)的主题上做了博士论文。 绕行将永远进行,尽管算法不会奏效,但应当大幅缩小范围。 I m 查询“婚姻”一词和每一改动、“配偶”一词以及每一改动,并检查其相对接近之处。 当然,我会寻找数百种全新案文。

Finding their relative proximity is the feature I m having a hard time finding. Beyond that, I may need to search for all names to ensure a main character if not the protagonist is involved. Meaning I m trying to determine
A. Names in general.
B. The protagonist. - should be among the most frequently used names.

关于一般名称,我没有掌握一个19世纪名字的综合数据库,因此,我无法打上适当的网吧。 从那以后,我听到通俗话以及校正之后的恰当节点。 我认为,我最好的是,通过一个全面的字典来过滤所有这些话,留下适当的口号。 姓名很可能是最常用的,但我能否过滤任何其他适当的网吧,如地方。 赠款远非完美,但大量缩减。

因此,这意味着对两个庞大的词组进行比较。 这样做有几吨的方法,但如果以我知道的语言很容易工作,那将是理想的。 我最好的猜测是,将大量资本化言论与一系列言词进行比较,找出分歧。 如果它存放在营地,或者说是一片垃圾。 至于任何其他语文,如果其运作较为简单,我确信我可以清楚地看到yn。

或许这是一个太大的背景,但关于整个算法和程序的任何建议也受到赞赏。

非常感谢你的时间和帮助! 你们会拯救无数时间,为一场巨大的博士论文作出贡献,因此我的朋友也将非常感谢。

Cheers!

问题回答

Sphider is an open source search engine which you can download, it have most of the requirements that you need http://www.sphider.eu/demo.php





相关问题
Acronyms with Sphinx search engine

how can i index acronyms like m.i.a. ? when i search for mia , i get results for mia and not m.i.a. . when i search for m.i.a. , i get nothing at all. edit: solution looks roughly like: ...

Querying multiple index in django-sphinx

The django-sphinx documentation shows that django-sphinx layer also supports some basic querying over multiple indexes. http://github.com/dcramer/django-sphinx/blob/master/README.rst from ...

Adding Search to Ruby on Rails - Easy Question

I am trying to figure out how to add search to my rails application. I am brand new so go slow. I have created a blog and done quite a bit of customizing including adding some AJAX, pretty proud of ...

Searching and ranking short phrases (e.g. movie titles)

I m trying to improve our search capabilities for short phrases (in our case movie titles) and am currently looking at SQL Server 2008 Full Text Search, which provides some of the functionality we ...

Will Full text search consider indexes?

Ok I have a full text search index created on my JobsToDo table, but what I m concerned about is if this is rendering my other indexes on the table useless. I have a normal nonclustered index on the ...

Lucene.NET on shared hosting

I m trying to get Lucene.NET to work on a shared hosting environment. Mascix over on codeproject outlines here how he got this to work on godaddy. I m attempting this on isqsolutions. Both ...

Hibernate Search or Compass

I can t seem to find any recent talk on the choice. Back in 06 there was criticism on Hibernate Search as being incomplete and not being ready to compete with Compass, is it now? Has anyone used both ...

热门标签