I would like to analyze text data in an Excel file. I know how I could read an Excel file via Python, but each piece of data becomes one value of a list. However, I would like to analyze text in each cell.
下面是Excel案的例子:
NAME INDUSTRY INFO A FINANCIAL THIS COMPANY IS BLA BLA BLA B MANUFACTURE IT IS LALALALALALALALALA C FINANCIAL THAT IS SOSOSOSOSOSOSOSO D AGRICULTURE WHYWHYWHYWHYWHY
I would like to analyze, say, the financial industry s company info using NLTK, such as the frequency of "IT".
这是我迄今为止所做的(即,它不工作!) :
import xlrd
aa= c:/book3.xls
wb = xlrd.open_workbook(aa)
wb.sheet_names()
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
print nltk.word_tokenize(sh.row_values(rownum))