English 中文(简体)
利用NLTK进行开采关系
原标题:extract relationships using NLTK
最佳回答

它看起来是一种“Parsed Doc”的物体,需要拥有headline member和text个成员,这两个成员都是标的,其中一部分被标为树木。 例如,这一(hacky)实例证明:

import nltk
import re

IN = re.compile (r .*in(?!.+ing) )

class doc():
  pass

doc.headline=[ foo ]
doc.text=[nltk.Tree( ORGANIZATION , [ WHYY ]),  in , nltk.Tree( LOCATION ,[ Philadelphia ]),  . ,  Ms. , nltk.Tree( PERSON , [ Gross ]),  , ]

for rel in  nltk.sem.extract_rels( ORG , LOC ,doc,corpus= ieer ,pattern=IN):
   print nltk.sem.relextract.show_raw_rtuple(rel)

产出:

[ORG:  WHYY ]  in  [LOC:  Philadelphia ]

显然,你实际上照此办理,但是,这为<编码>Exract_rels所期望的数据格式提供了一个工作范例,你只是需要确定如何采取预处理步骤,使你的数据按此格式集中。

问题回答

The source Code of nltk.sem.extract_rels function :

def extract_rels(subjclass, objclass, doc, corpus= ace , pattern=None, window=10):
"""
Filter the output of ``semi_rel2reldict`` according to specified NE classes and a filler pattern.

The parameters ``subjclass`` and ``objclass`` can be used to restrict the
Named Entities to particular types (any of  LOCATION ,  ORGANIZATION ,
 PERSON ,  DURATION ,  DATE ,  CARDINAL ,  PERCENT ,  MONEY ,  MEASURE ).

:param subjclass: the class of the subject Named Entity.
:type subjclass: str
:param objclass: the class of the object Named Entity.
:type objclass: str
:param doc: input document
:type doc: ieer document or a list of chunk trees
:param corpus: name of the corpus to take as input; possible values are
     ieer  and  conll2002 
:type corpus: str
:param pattern: a regular expression for filtering the fillers of
    retrieved triples.
:type pattern: SRE_Pattern
:param window: filters out fillers which exceed this threshold
:type window: int
:return: see ``mk_reldicts``
:rtype: list(defaultdict)
"""
....

So if you pass corpus parameter as ieer, the nltk.sem.extract_rels function expects the doc parameter to be a IEERDocument object. You should pass corpus as ace or just don t pass it(default is ace). In this case it expects a list of chunk trees(that s what you wanted). I modified the code as below:

import nltk
import re
from nltk.sem import extract_rels,rtuple

#billgatesbio from http://www.reuters.com/finance/stocks/officerProfile?symbol=MSFT.O&officerId=28066
with open( billgatesbio.txt ,  r ) as f:
    sample = f.read().decode( utf-8 )

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]

# here i changed reg ex and below i exchanged subj and obj classes  places
OF = re.compile(r .*of.* )

for i, sent in enumerate(tagged_sentences):
    sent = nltk.ne_chunk(sent) # ne_chunk method expects one tagged sentence
    rels = extract_rels( PER ,  ORG , sent, corpus= ace , pattern=OF, window=7) # extract_rels method expects one chunked sentence
    for rel in rels:
        print( {0:<5}{1} .format(i, rtuple(rel)))

And it gives the result :

[PER: u Chairman/NNP ] u and/CC Chief/NNP Executive/NNP Officer/NNP of/IN the/DT  [ORG: u Company/NNP ]




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签