Question

I m looking for a search engine script, or search engine that can:

Search lots of large text files, specifically hundreds of full text novels.
Use regex to return words and possible variations.
Give the location in the file of all the matches, such as line number, or word count.
Ideally with javascript or php, as they re the only languages I m adept in, and I ll probably have to manipulate the results. But I m sure I can bite the bullet and learn the syntax to whatever language needed.
Filter a search result array of words against a dictionary to find proper nouns (This may not include the search engine)

背景和具体事项(长期和仅具有一定重要性的类型):

我有一位朋友,在19世纪新鲜婚(思莎士莎歌剧团)的主题上做了博士论文。绕行将永远进行,尽管算法不会奏效,但应当大幅缩小范围。 I m 查询“婚姻”一词和每一改动、“配偶”一词以及每一改动,并检查其相对接近之处。当然,我会寻找数百种全新案文。

Finding their relative proximity is the feature I m having a hard time finding. Beyond that, I may need to search for all names to ensure a main character if not the protagonist is involved. Meaning I m trying to determine
A. Names in general.
B. The protagonist. - should be among the most frequently used names.

关于一般名称,我没有掌握一个19世纪名字的综合数据库,因此,我无法打上适当的网吧。从那以后,我听到通俗话以及校正之后的恰当节点。我认为,我最好的是,通过一个全面的字典来过滤所有这些话,留下适当的口号。姓名很可能是最常用的,但我能否过滤任何其他适当的网吧,如地方。赠款远非完美,但大量缩减。

因此,这意味着对两个庞大的词组进行比较。这样做有几吨的方法,但如果以我知道的语言很容易工作,那将是理想的。我最好的猜测是,将大量资本化言论与一系列言词进行比较,找出分歧。如果它存放在营地,或者说是一片垃圾。至于任何其他语文,如果其运作较为简单,我确信我可以清楚地看到yn。

或许这是一个太大的背景,但关于整个算法和程序的任何建议也受到赞赏。

非常感谢你的时间和帮助! 你们会拯救无数时间,为一场巨大的博士论文作出贡献,因此我的朋友也将非常感谢。

Cheers!

Answer 1

Sphider is an open source search engine which you can download, it have most of the requirements that you need http://www.sphider.eu/demo.php

友情链接