Question

(I am working interactively with a WordprocessingDocument object in IronPython using the OpenXML SDK, but this is really a general Python question that should be applicable across all implementations)

I am trying to scrape out some tables from a number of Word documents. For each table, I have an iterator that is giving me table row objects. I then use the following generator statement to get a tuple of cells from each row:

for row in rows:
    t = tuple([c.InnerText for c in row.Descendants[TableCell]()])

Each tuple contains 4 elements. Now, in column t[1] for each tuple, I need to apply a regex to the data. I know that tuples are immutable, so I m happy to either create a new tuple, or build the tuple in a different way. Given that row.Descendants[TableCell]() returns an iterator, what s the most Pythonic (or at least simplest) way to construct a tuple from an iterator where I want to modify the nth element returned?

My brute-force method right now is to create a tuple from the left slice (t[:n-1]), the modified data in t[n] and the right slice (t[n+1:]) but I feel like the itertools module should have something to help me out here.

Answer 1

def item(i, v):
  if i != 1: return v
  return strangestuff(v)

for row in rows:
  t = tuple(item(i, c.InnerText)
            for i, c in enumerate(row.Descendants[TableCell]())
           )

Answer 2

I would do this:

temp_list = [c.InnerText for c in row.Descendants[TableCell]()]
temp_list[2] = "Something different"
t = tuple(temp_list)

It would work like this:

>>> temp_list = [i for i in range(4)]
>>> temp_list[2] = "Something different"
>>> t = tuple(temp_list)
>>> t
(0, 1,  Something different , 3)

Answer 3

If every tuple contains 4 elements, then, frankly, I think you d be better off assigning them to individual variables, manipulating those, and then building your tuple:

for row in rows:
    t1, t2, t3, t4 = tuple([c.InnerText for c in row.Descendants[TableCell]()])
    t1 = ...
    t = (t1, t2, t3, t4)

Answer 4

What i ve generally done, but am not a fan of:

l = list(oldtuple) l[2] = foo t = tuple(l)

I kind of want something like update() for dicts

newtuple = update(oldtuple, (None, None, val, None))

Or perhaps the right structure is a zip

newtuple = update(oldtuple, ((2, val), (3, val)))

友情链接