Question

I would like to analyze text data in an Excel file. I know how I could read an Excel file via Python, but each piece of data becomes one value of a list. However, I would like to analyze text in each cell.

下面是Excel案的例子:

NAME    INDUSTRY        INFO    
A       FINANCIAL       THIS COMPANY IS BLA BLA BLA 
B       MANUFACTURE     IT IS LALALALALALALALALA    
C       FINANCIAL       THAT IS SOSOSOSOSOSOSOSO    
D       AGRICULTURE     WHYWHYWHYWHYWHY

I would like to analyze, say, the financial industry s company info using NLTK, such as the frequency of "IT".

这是我迄今为止所做的(即,它不工作!) :

import xlrd
aa= c:/book3.xls 
wb = xlrd.open_workbook(aa)
wb.sheet_names()
sh = wb.sheet_by_index(0)

for rownum in range(sh.nrows):
     print nltk.word_tokenize(sh.row_values(rownum))

Answer 1

你们一行将所有价值观传到“象征性”一词,但你只关心第3栏的内容。你也在处理头盔。为此:

import xlrd
book = xlrd.open_workbook("your_input_file.xls")
sheet = book.sheet_by_index(0)
for row_index in xrange(1, sheet.nrows): # skip heading row
    name, industry, info = sheet.row_values(row_index, end_colx=3)
    print "Row %d: name=%r industry=%r info=%r" %
        (row_index + 1, name, industry, info)
    print nltk.word_tokenize(info) # or whatever else you want to do

友情链接