English 中文(简体)
lxml iterparse mising child nodes
原标题:lxml iterparse mising child nodes
  • 时间:2011-11-14 22:49:01
  •  标签:
  • python
  • lxml

我正在使用lxml的炉.阅读巨大的xml文档。 对于某个主要要素,我检查儿童的内容,处理每个儿童。 但是,我注意到,在检查一个要素内的儿童时,教区实际上有时还缺少一些儿童节点。 我甚至印刷了每个要素的长度,每个要素的长度应为一定数量,但有时比应该少。 令人惊讶的是,这种情况通常发生在第5区(一个区块;主要要素发生)。 为什么教区会误导孩子们? 任何杂质?

Sample code-

from lxml import etree  
def parseXml(context,attribList,elemList,mainElement):      
   for event, element in context: 
       if element.tag == mainElement and event== start :
            for child in element:
               if child.tag in elemList:
                   print len(child) #for a given child,the len should be constant
                   #do things   
       elif event== end :
         element.clear() 

感谢!

最佳回答

在界定背景时,确保确定参数<代码>events至(结尾,,而不是(起始,)。 否则,你就可以得到你描述的行为。

context=etree.iterparse(filehandle, events=( end ,), tag=mainElement)

我认为,问题在于,在操作<代码>parseXml时, lxml正在一面处理XML,因此,在按相应的<代码>end要素进行分类之前,你可以达到<条码><><>>>>>。 因此,当你通过这个要素照顾孩子时,你只能取得部分成果。


http://www.ibm.com/developerworks/xml/library/x-hiperfparse/“rel=“nofollow”>, 该条为组织这一活动提供了一种极好的方法,旨在处理大型XML:

def fast_iter(context, func, *args, **kwargs):
    # http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
    # Author: Liza Daly
    for event, elem in context:
        func(elem, *args, **kwargs)
        elem.clear()
        while elem.getprevious() is not None:
            del elem.getparent()[0]
    del context

def parseXml(element,attribList,elemList): 
    for child in element:
       if child.tag in elemList:
           print len(child) #for a given child,the len should be constant
           #do things   

context=etree.iterparse(filehandle, events=( end ,), tag=mainElement)   
fast_iter(context, parseXml, attribList, elemList)
问题回答

暂无回答




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签