<USBUREAU>Packers and Stockyards Administration</USBUREAU>
Amendment to Certification of Central Filing System_Oklahoma
The Statewide central filing system of Oklahoma has been previously certified, pursuant to section 1324 of the Food
Security Act of 1985, on the basis of information submitted by Hannah D. Atkins, Secretary of State, for farm products
produced in that State (52 FR 49056, December 29, 1987).
The certification is hereby amended on the basis of information submitted by John Kennedy, Secretary of State, for
additional farm products produced in that State as follows: Cattle semen, cattle embryos, milo.
This is issued pursuant to authority delegated by the Secretary of Agriculture.
<!-- PJG ITAG l=21 g=1 f=1 -->
 Sec. 1324(c)(2), Pub. L. 99-198, 99 Stat. 1535, 7 U.S.C. 1631(c)(2); 7 CFR 2.18(e)(3), 2.56(a)(3), 55 FR 22795.
Dated: January 21, 1994
<!-- PJG ITAG l=06 g=1 f=1 -->
Calvin W. Watkins, Acting Administrator,
<!-- PJG ITAG l=04 g=1 f=1 -->
Packers and Stockyards Administration.
<!-- PJG ITAG l=40 g=1 f=1 -->
[FR Doc. 94-1847 Filed 1-27-94; 8:45 am]
我的任务是从这些短文节中抽出案文。 这是我做的:

def getTextFromXML():
    global Text, xmlDoc
    TextNodes = xmlDoc.getElementsByTagName("TEXT")
    docstr =   
    #Text = [TextFromNode(textNode) for textNode in TextNodes]
    for textNode in TextNodes:
        for cNode in textNode.childNodes:
            if cNode.nodeType == Node.TEXT_NODE:
                for ccNode in cNode.childNodes:
                    if ccNode.nodeType == Node.TEXT_NODE:

问题在于它耗费大量时间。 我猜测我的职能效率不高。 没有人会告诉我如何改进?

http://www.un.org。 处理文件Im包含大约6000+<TEXT>文本内容。


lxml比标准python图书馆所包含的xml图书馆更容易使用。 它对C libxml2图书馆具有约束力,因此Im假设该图书馆也更快。


from lxml import etree
with open( some-file.xml ) as f:
    xmlDoc = etree.parse(f)
    root = xmlDoc.getroot()

    Text = []
    for textNode in root.xpath( TEXT ):
        docstr =  
 .join(text.strip() for text in textNode.xpath( */text() | text() ) if text.strip())


s =   .join(elem.itertext())


s = elem.xpath( string() )

