English 中文(简体)
How to identify ideas and concepts in a given text
原标题:

I m working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:

Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph?

It d be great to be able to detect that the person has asked for a photograph of Mr Jones. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like:

Please, never send me a photo of Mr Jones.

Does anyone know where to start with this? Is it even possible?

I ve looked into things like nltk, but I ve yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great.

Thanks!

问题回答

The best thing out there that might be useful to you is automatic sentiment analysis. This is used, for example, to judge whether, say, a customer review is positive or negative. I cannot give you direct pointers to available tools, but this is what you are looking for.

I must say, though, that this is a current hot topic in natural language processing and I’ve seen a number of papers at conferences. It’s definitely quite a complex matter and if you’re starting from scratch, it might take quite some time before you get the results that you want.

NLTK is not a bad framework for parsing natural language but beware that this is not a simple matter. Doing stuff like this is really research level programming.

A good thing that makes it much easier is if you have a very limited domain - say your application focuses on information about famous writers, then you can avoid some complexities of natural language like certain types of ambiguities.

Where to start? Good question. I don t know of any tutorials on the topic (and I presume you tried the Google option) but I d imagine that iTunes U would have a course on the topic. If not I can post a link to a course I ve done that mentions the subject and wasn t completely horrible: http://www.inf.ed.ac.uk/teaching/courses/inf2a/lecturematerials/index.html#lecture01

The problem that u tackle is very challenging.

I would start by first identifying the entities in the text (problem referred as Named Entity Recognition, google it), and then a I would try to identify concepts.

If want to roughly identify what is the text about, I suggest that you start by using WordNet and according to the words and their places in the hierarchy to identify the concepts involved. If you want to produce a system which show real intelligence than you should start researching about resources such as CYC (OpenCYC) which will allow you to convert the sentences into FOL sentences.

This hardcore AI, approach to solving your problem. For simple chat bot, it would be easier to rely on simple statistical methods.

good luck





相关问题
Java Stanford NLP: Part of Speech labels?

The Stanford NLP, demo d here, gives an output like this: Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./. What do the Part of Speech tags mean? I am unable to find an official list. Is it ...

Java Stanford NLP: Find word frequency?

I m using the Stanford NLP Parsing toolkit. Given a word in the lexicon, how can I find its frequency*? Or, given a frequency rank, how can I determine the corresponding word? *in the entire language,...

c/c++ NLP library [closed]

I am looking for an open source Natural Language Processing library for c/c++ and especially i am interested in Part of speech tagging.

Clustering text in Python [closed]

I need to cluster some text documents and have been researching various options. It looks like LingPipe can cluster plain text without prior conversion (to vector space etc), but it s the only tool I ...

Natural language rendering

Do you know any frameworks that implement natural language rendering concept ? I ve found several NLP oriented frameworks like Anthelope or Open NLP but they have only parsers but not renderers or ...

热门标签