English 中文(简体)
从重新结构化文本中提取字段列表Text
原标题:Extract field list from reStructuredText

说我有以下的回复输入:

Some text ...

:foo: bar

Some text ...

我最后想用这样的话语来结束:

{"foo": "bar"}

我试着用这个:

tree = docutils.core.publish_parts(text)

它可以分析字段列表, 但我最后在 tree [“全”] 中出现一些假的 XML? :

<document source="<string>">
    <docinfo>
        <field>
            <field_name>
                foo
            <field_body>
                <paragraph>
                    bar

由于tree dict 没有包含任何其他有用的信息, 这只是一个字符串, 我不知道如何从 reST 文档中解析字段列表。 我该怎么做?

最佳回答

您可以尝试使用以下代码。 与其使用 publish_ parts 方法, 不如使用 < a href=> http://docutils. sourceforge. net/docs/ api/publisher. html", rel= “nofollow”\\ code> publish_doctree , 来获取您的文档的伪 XML 表示。 我随后转换为 XML DOM, 以便提取所有 < code > field 元素 。 然后, 我得到了每个 < code > field_ name 元素的第一个 < code > 和 < field_body < body 元素。

from docutils.core import publish_doctree

source = """Some text ...

:foo: bar

Some text ...
"""

# Parse reStructuredText input, returning the Docutils doctree as
# an `xml.dom.minidom.Document` instance.
doctree = publish_doctree(source).asdom()

# Get all field lists in the document.
fields = doctree.getElementsByTagName( field )

d = {}

for field in fields:
    # I am assuming that `getElementsByTagName` only returns one element.
    field_name = field.getElementsByTagName( field_name )[0]
    field_body = field.getElementsByTagName( field_body )[0]

    d[field_name.firstChild.nodeValue] = 
        " ".join(c.firstChild.nodeValue for c in field_body.childNodes)

print d # Prints {u foo : u bar }

http://docs.python.org/library/xml.dom.html" rel="nofollow" >xml.dom 模块并非最容易使用(为什么我需要使用 .firstChild.nodeValue ,而不仅仅是 .nodeValue ),所以你不妨使用 xml.etree.html" rel=“nofollowtre 模块,我发现使用该模块更容易使用。如果使用 lxml,你也可以使用XPATH Notation来查找所有 field , , fiel_name 和 field_body_body 。

问题回答

我发现我有一个替代的解决方案, 我发现这个解决方案可以减轻负担, 但也许更易碎。 在审查节点等级 < a href=> https://sourceforge. net/ p/docutils/ code/ HEAD/ tree/ tree/ trunk/ docutils/ docutils/ nodes. py\" rel= "nofollow" > https://sourceforge. net/ p/docutils/ code/ HEAD/ tree/ tree/ trunk/ docutils/docutils/ nodes. py < / a > 之后, 你会看到它支持一种步行方法, 可以用来拉出想要的数据, 而不必创建两个不同的 Xml 数据表示方式。 这里就是我在程序代码中使用的方法 :

https://github.com/h4ck3rm1k3/gcc-introspector/blob/master/peewee_adptor.py#L33

from docutils.core import publish_doctree
import docutils.nodes

时和时

def walk_docstring(prop):
    doc = prop.__doc__
    doctree = publish_doctree(doc)
    class Walker:
        def __init__(self, doc):
            self.document = doc
            self.fields = {}
        def dispatch_visit(self,x):
            if isinstance(x, docutils.nodes.field):
                field_name = x.children[0].rawsource
                field_value = x.children[1].rawsource
                self.fields[field_name]=field_value
    w = Walker(doctree)
    doctree.walk(w)
    # the collected fields I wanted
    pprint.pprint(w.fields)

http://docs.python.org/library/xml.etree.elementtree.html" rel=“不随从 noreferreerr'>ElementTree 执行:

from docutils.core import publish_doctree
from xml.etree.ElementTree import fromstring

source = """Some text ...

:foo: bar

Some text ...
"""


def gen_fields(source):
    dom = publish_doctree(source).asdom()
    tree = fromstring(dom.toxml())

    for field in tree.iter(tag= field ):
        name = next(field.iter(tag= field_name ))
        body = next(field.iter(tag= field_body ))
        yield {name.text:   .join(body.itertext())}

用法

>>> next(gen_fields(source))
{ foo :  bar }




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...