English 中文(简体)
List of proper names?
原标题:

I m trying to filter names out of text blobs. Currently I m just generating a words list and filtering it by hand but I ve got ~8k words to go so I m looking for a better way. I could grab a dictionary and filter them out but that would cull names like smith and cliff.

What I need is either of the following:

  • a list of common names (I d need the >5k most common names)
  • a list of names that also happen to be words

I figure between them, I can do a combined blacklist/whitelist to get what I need.

最佳回答

US Census name list: http://www.census.gov/genealogy/www/

That should get you one angle on the problem, anyway.

edited changed URL, per comment below about page moving. Nobody believes in HTTP 302 anymore?

问题回答

From a post I found at Quora:

CMU s NELL project has collected a huge list of proper nouns from the web and categorized them by type. You can browse online at: NELL KnowledgeBase Browser and download the data at: Resources & Data.

Web scraping the results for, say, personUS seems more efficient than what I did, which is extracting a list of names from phrases tagged as "person" in their big tab-delimited CSV file. Either way you ll be using regex.





相关问题
Which non-bold UTF-8 iPhone SDK font can I use here?

Problem: I create a UILabel with this font: label.font = [UIFont systemFontOfSize:15.0f]; Now this system font is bold, but I want to display pretty small text, and the text may also contain very ...

Limiting file upload type

Simple question. Is there a way to only allow txt files upon uploading? I ve looked around and all I find is text/php, which allows PHP. $uploaded_type=="text/php

Extract everything from PDF [closed]

Looking for solution to extract content from a PDF file (using console tool or a library). It will be used on server to produce on-line e-books from uploaded PDF files. Need to extract following ...

Flash & external form fields

Does anybody know if this is possible? I am trying to create a flash movie that will show / preview what I am typing into a field in a normal HTML form. The call to update the flash movie would most ...

Avoiding escape sequence processing in ActionScript?

I need to mimic C# functionality of the @ symbol when it precedes a string. @"C:AFilePath" for example What is the best way to do this? Also there are some sites that will escape larger ...

iPhone: Draw rotated text?

I want to draw some text in a view, rotated 90°. I m pretty new to iPhone development, and poking around the web reveals a number of different solutions. I ve tried a few and usually end up with my ...

Numbering in jQuery

How could I change the text below so that the text within it has a number appended to it. <div class="right">This is some text</div> <div class="right">This is some text</div>...

热门标签