English 中文(简体)
term clustering library?
原标题:

Does anybody know an open-sourcefree library that does term clustering?

Thanks, yaniv

问题回答

Apache Mahout provides algorithms for clustering.

Checkout NLTK. There s a number of clustering modules that might work for you.

WEKA has a whole suite of tools for text processing along with clustering.

If your in to python there is NLTK, as already mentioned by it s author, but there is also sklearn which provides much more than just clustering. (Link takes you to text applicable examples).

Python Scikit learn has some dedicated packages for text analysis. Besides they have a complete suite of Clustering Algorithms that includes K-means, AP, Mean shift, Spectral Clustering, Hierarchical Clustering and DBSCAN algorithms (with appropriate evaluation metrics). This may be helpful your term clustering task.

Link to Scikit Learn latest video tutorial

Link to Scikit Learn Book





相关问题
Split a text file in PHP

How can I split a large text file into separate files by character count using PHP? So a 10,000 character file split every 1000 characters would be split into 10 files. Further, can I split only after ...

term clustering library?

Does anybody know an open-sourcefree library that does term clustering? Thanks, yaniv

Unicode Strings in Ruby 1.9

I ve written a Ruby script that is reading a file (File.read()) that contains unicode characters, and it works fine from the command line. However, when I try to put it into an Automator Workflow (...

Extract key sentences from a text

do you know about an effective method for extracting key sentences from a text with their frequency parameters, etc and that can also do "stemming" (search also for similar sentences) ? I wonder also ...

Getting word count for all files within a folder

I need to find word count for all of the files within a folder. Here is the code I ve come up with so far: $f="../mts/sites/default/files/test.doc"; // count words $numWords = str_word_count($str)/...

How to read, in a line, all characters from column A to B

is it possible in Python, given a file with 10000 lines, where all of them have this structure: 1, 2, xvfrt ert5a fsfs4 df f fdfd56 , 234 or similar, to read the whole string, and then to ...

热门标签