www.un.org/Depts/DGACM/index_spanish.htm 答复: 您可以通过<代码>Doc.GetText(SVG),将XML打成<>TEXT和TSPAN
内容,并确定是否有应当作为实际空间加以处理的排位。 你从PDFBox那里看到的行为可能是他们试图作出这一假设。 而且,即使Adobe Acrobat也可以通过纸板归还空间文本,因为PDFBox确实这样做。
Long Answer: This may cause more problems, as this may not be the original intent of the text in the PDF.
ABCpdf正在做正确的事,因为人民抵抗力量只描述了哪些东西应当放在产出中。 我们可以构造一份PDF文件,即ABCpdf对两种风格的解释,即使原判几乎相同。
为了证明这一点,本文是Adobe InDesign一份文件的缩略语,该文件显示了一种案文,将这两种案件与你样本判决相匹配。
请注意,第一行不是用实际空间修建的,而是用手提放在各个文本区域,大致看上像一个适当空间的句子。 第二行有一句话,在单一文本区域,在字句之间有实际的文本空间。
出口到PDF,然后由ABCpdf读到,Doc.GetText(“TEXT”)
将退回:
ThisSentenceDoesn tHaveAnySpacesBetweenWords.
This Sentence Doesn t Have Any Spaces Between Words.
Thus if you wish to detect layout spaces, you must use SVG output and step through the tokens of text manually. Doc.GetText("SVG")
returns text and other drawing entities as ABCpdf sees them on the page, and you can decide how you want to handle the case of layout based spacing.
You ll receive output similar to this:
<?xml version="1.0" standalone="no"?>
<svg width="612" height="792" x="0" y="0" version="1.1" baseProfile="full" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<text xml:space="preserve" x="36" y="46.1924" font-size="14" font-family="ArialMT" textLength="26.446" transform="translate(36, 46.1924) translate(-36, -46.1924)">This</text>
<text xml:space="preserve" x="66.002" y="46.1924" font-size="14" font-family="ArialMT" textLength="59.15" transform="translate(66.002, 46.1924) translate(-66.002, -46.1924)">Sentence</text>
<text xml:space="preserve" x="129.604" y="46.1924" font-size="14" font-family="ArialMT" textLength="47.46" transform="translate(129.604, 46.1924) translate(-129.604, -46.1924)">Doesn’t</text>
<text xml:space="preserve" x="181.208" y="46.1924" font-size="14" font-family="ArialMT" textLength="32.676" transform="translate(181.208, 46.1924) translate(-181.208, -46.1924)">Have</text>
<text xml:space="preserve" x="219.61" y="46.1924" font-size="14" font-family="ArialMT" textLength="24.122" transform="translate(219.61, 46.1924) translate(-219.61, -46.1924)">Any</text>
<text xml:space="preserve" x="249.612" y="46.1924" font-size="14" font-family="ArialMT" textLength="46.69" transform="translate(249.612, 46.1924) translate(-249.612, -46.1924)">Spaces</text>
<text xml:space="preserve" x="301.216" y="46.1924" font-size="14" font-family="ArialMT" textLength="54.474" transform="translate(301.216, 46.1924) translate(-301.216, -46.1924)">Between</text>
<text xml:space="preserve" x="360.016" y="46.1924" font-size="14" font-family="ArialMT" transform="translate(360.016, 46.1924) translate(-360.016, -46.1924)"><tspan textLength="13.216">W</tspan><tspan dx="-0.252" textLength="31.122">ords.</tspan></text>
<text xml:space="preserve" x="36.014" y="141.9944" font-size="14" font-family="ArialMT" transform="translate(36.014, 141.9944) translate(-36.014, -141.9944)">
<tspan textLength="181.3">This Sentence Doesn’t Have </tspan><tspan dx="-0.756" textLength="150.178">Any Spaces Between W</tspan><tspan dx="-0.252" textLength="31.122">ords.</tspan></text>
</svg>
并且指出,基本结构显示出给你造成问题的最初意图。 (xml:去除的空间和属性,为了举例来说改变白色空间)
<?xml version="1.0" standalone="no"?>
<svg>
<text>This</text>
<text>Sentence</text>
<text>Doesn’t</text>
<text>Have</text>
<text>Any</text>
<text>Spaces</text>
<text>Between</text>
<text><tspan>W</tspan><tspan>ords.</tspan></text>
<text>
<tspan>This Sentence Doesn’t Have </tspan>
<tspan>Any Spaces Between W</tspan>
<tspan>ords.</tspan>
</text>
</svg>