Question

说我有以下的回复输入:

Some text ...

:foo: bar

Some text ...

我最后想用这样的话语来结束:

{"foo": "bar"}

我试着用这个:

tree = docutils.core.publish_parts(text)

它可以分析字段列表, 但我最后在 tree [“全”] 中出现一些假的 XML? :

<document source="<string>">
    <docinfo>
        <field>
            <field_name>
                foo
            <field_body>
                <paragraph>
                    bar

由于tree dict 没有包含任何其他有用的信息, 这只是一个字符串, 我不知道如何从 reST 文档中解析字段列表。我该怎么做?

Answer 1

您可以尝试使用以下代码。与其使用 publish_ parts 方法, 不如使用 < a href=> http://docutils. sourceforge. net/docs/ api/publisher. html", rel= “nofollow”\\ code> publish_doctree , 来获取您的文档的伪 XML 表示。我随后转换为 XML DOM, 以便提取所有 < code > field 元素。然后, 我得到了每个 < code > field_ name 元素的第一个 < code > 和 < field_body < body 元素。

from docutils.core import publish_doctree

source = """Some text ...

:foo: bar

Some text ...
"""

# Parse reStructuredText input, returning the Docutils doctree as
# an `xml.dom.minidom.Document` instance.
doctree = publish_doctree(source).asdom()

# Get all field lists in the document.
fields = doctree.getElementsByTagName( field )

d = {}

for field in fields:
    # I am assuming that `getElementsByTagName` only returns one element.
    field_name = field.getElementsByTagName( field_name )[0]
    field_body = field.getElementsByTagName( field_body )[0]

    d[field_name.firstChild.nodeValue] = 
        " ".join(c.firstChild.nodeValue for c in field_body.childNodes)

print d # Prints {u foo : u bar }

http://docs.python.org/library/xml.dom.html" rel="nofollow" >xml.dom 模块并非最容易使用(为什么我需要使用 .firstChild.nodeValue ,而不仅仅是 .nodeValue ),所以你不妨使用 xml.etree.html" rel=“nofollowtre 模块,我发现使用该模块更容易使用。如果使用 lxml,你也可以使用XPATH Notation来查找所有 field , , fiel_name 和 field_body_body 。

Answer 2

我发现我有一个替代的解决方案, 我发现这个解决方案可以减轻负担, 但也许更易碎。在审查节点等级 < a href=> https://sourceforge. net/ p/docutils/ code/ HEAD/ tree/ tree/ trunk/ docutils/ docutils/ nodes. py\" rel= "nofollow" > https://sourceforge. net/ p/docutils/ code/ HEAD/ tree/ tree/ trunk/ docutils/docutils/ nodes. py < / a > 之后, 你会看到它支持一种步行方法, 可以用来拉出想要的数据, 而不必创建两个不同的 Xml 数据表示方式。这里就是我在程序代码中使用的方法 :

https://github.com/h4ck3rm1k3/gcc-introspector/blob/master/peewee_adptor.py#L33

from docutils.core import publish_doctree
import docutils.nodes

时和时

def walk_docstring(prop):
    doc = prop.__doc__
    doctree = publish_doctree(doc)
    class Walker:
        def __init__(self, doc):
            self.document = doc
            self.fields = {}
        def dispatch_visit(self,x):
            if isinstance(x, docutils.nodes.field):
                field_name = x.children[0].rawsource
                field_value = x.children[1].rawsource
                self.fields[field_name]=field_value
    w = Walker(doctree)
    doctree.walk(w)
    # the collected fields I wanted
    pprint.pprint(w.fields)

Answer 3

http://docs.python.org/library/xml.etree.elementtree.html" rel=“不随从 noreferreerr'>ElementTree 执行:

from docutils.core import publish_doctree
from xml.etree.ElementTree import fromstring

source = """Some text ...

:foo: bar

Some text ...
"""


def gen_fields(source):
    dom = publish_doctree(source).asdom()
    tree = fromstring(dom.toxml())

    for field in tree.iter(tag= field ):
        name = next(field.iter(tag= field_name ))
        body = next(field.iter(tag= field_body ))
        yield {name.text:   .join(body.itertext())}

用法

>>> next(gen_fields(source))
{ foo :  bar }

友情链接