I m trying to filter names out of text blobs. Currently I m just generating a words list and filtering it by hand but I ve got ~8k words to go so I m looking for a better way. I could grab a dictionary and filter them out but that would cull names like smith and cliff.
What I need is either of the following:
- a list of common names (I d need the >5k most common names)
- a list of names that also happen to be words
I figure between them, I can do a combined blacklist/whitelist to get what I need.