English 中文(简体)
How to use DBPedia to extract Tags/Keywords from content?
原标题:
  • 时间:2011-01-20 13:58:17
  •  标签:
  • dbpedia

I am exploring how I can use Wikipedia s taxonomy information to extract Tags/Keywords from my content.

I found articles about DBPedia. DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

Has anyone used their web services? Do you know how they work and how reliable it is?

问题回答

DBpedia is a fantastic, high quality resource. In order to turn your content into a set of relevant DBpedia concepts, however, you will need to accurately identify them in your text, which involves at least two steps:

  1. Identify DBpedia concepts in your content: This includes recognizing concept names (and alternate names) in text, and also disambiguating among all possible meanings of each phrase. The term "Sun" may refer to dozens of possible concepts according to its disambiguation page including a star, newspapers, person names, etc. This involves entity identification, classification, and linking.

  2. Identify which of those concepts are interesting: For example, do you want the concept "Definite article" showing up when text includes the term "the" (which The redirects to)?

You may want to consider a preexisting text analytics library or service, which supports entity linking to DBpedia. One great tool for topic indexing is Maui, which was developed by Alyona Medelyan during her PhD. Another great open source solution is Wikipedia Miner by David Milne at the same university.

Two commercial services which provide linking to DBpedia concepts are Zemanta and Extractiv (allow some level of free use). DBpedia spotlight option. Others which may provide these capabilities are listed at: https://stackoverflow.com/questions/2119279/is-there-a-better-tool-than-opencalais

Disclosure: I [used to] work at Extractiv (defunct), which is powered by Language Computer Corporation s NLP.

You can use Apache Stanbol for this process. Entityhub component of Apache Stanbol provides producing custom DBPedia indexes based on your needs. Then you can use Enhancer component to extract Places, Persons, Locations entities from your text.

Following mail thread may be helpful for you.
http://markmail.org/message/52266yl5ohijxiof

You can access running demos of Apache Stanbol from the following link:
http://dev.iks-project.eu/

You can also ask your further questions to stanbol-dev AT incubator.apache.org.





相关问题
How to use DBPedia to extract Tags/Keywords from content?

I am exploring how I can use Wikipedia s taxonomy information to extract Tags/Keywords from my content. I found articles about DBPedia. DBpedia is a community effort to extract structured information ...

Using DBpedia for book search

I want to write code to query DBpedia and search for book information like author, book title, etc.. What is the prefix and which Named Graph should I use to build this SPARQL query?

Link weight between DBpedia objects

I am bit new to this semantic web topic and especially DBpedia, as much as I did reading about this I could not find any information about possibility to determine weight of link between DBpedia ...

Business Intelligence (BI) on Wikipedia data

Intro: I am a BI addict and would like to develop a project to drill-down Wikipedia s data. I would write scripts to extract data from dbpedia (probably beginning by people articles) and load it into ...

Parsing dbpedia JSON in Python

I m trying to get my head around the dbpedia JSON schema and can t figure out an efficient way of extracting a specific node: This is what dbpedia gives me: http://dbpedia.org/data/Ceramic_art.json ...

热门标签