is there a library for ruby or php that is able to parse html pages and extract unique data by comparing it with other similar pages....should use some sort of text mining to identify which texts are more likely noise and repetivie, while other texts are more unique and useful...
The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...