The source Code of nltk.sem.extract_rels function :
def extract_rels(subjclass, objclass, doc, corpus= ace , pattern=None, window=10):
"""
Filter the output of ``semi_rel2reldict`` according to specified NE classes and a filler pattern.
The parameters ``subjclass`` and ``objclass`` can be used to restrict the
Named Entities to particular types (any of LOCATION , ORGANIZATION ,
PERSON , DURATION , DATE , CARDINAL , PERCENT , MONEY , MEASURE ).
:param subjclass: the class of the subject Named Entity.
:type subjclass: str
:param objclass: the class of the object Named Entity.
:type objclass: str
:param doc: input document
:type doc: ieer document or a list of chunk trees
:param corpus: name of the corpus to take as input; possible values are
ieer and conll2002
:type corpus: str
:param pattern: a regular expression for filtering the fillers of
retrieved triples.
:type pattern: SRE_Pattern
:param window: filters out fillers which exceed this threshold
:type window: int
:return: see ``mk_reldicts``
:rtype: list(defaultdict)
"""
....
So if you pass corpus parameter as ieer, the nltk.sem.extract_rels function expects the doc parameter to be a IEERDocument object. You should pass corpus as ace or just don t pass it(default is ace). In this case it expects a list of chunk trees(that s what you wanted). I modified the code as below:
import nltk
import re
from nltk.sem import extract_rels,rtuple
#billgatesbio from http://www.reuters.com/finance/stocks/officerProfile?symbol=MSFT.O&officerId=28066
with open( billgatesbio.txt , r ) as f:
sample = f.read().decode( utf-8 )
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
# here i changed reg ex and below i exchanged subj and obj classes places
OF = re.compile(r .*of.* )
for i, sent in enumerate(tagged_sentences):
sent = nltk.ne_chunk(sent) # ne_chunk method expects one tagged sentence
rels = extract_rels( PER , ORG , sent, corpus= ace , pattern=OF, window=7) # extract_rels method expects one chunked sentence
for rel in rels:
print( {0:<5}{1} .format(i, rtuple(rel)))
And it gives the result :
[PER: u Chairman/NNP ] u and/CC Chief/NNP Executive/NNP Officer/NNP of/IN the/DT [ORG: u Company/NNP ]