Question

我正在建立一个桌面应用程序，将其可读性的输出作为在WebBrowser控件中显示的XHTML呈现。最终，这个输出将不得不从XHTML文件转换为图像系统中的文档图像。与XHTML文档不同，文档图像必须分成物理页面；此外 - 这是让我非常困扰的部分 - 这些页面需要有页眉和页脚。

尽管我很想这样做，但我不能简单地让WebBrowser打印到文件中 - 它支持的页眉/页脚选项远远不够复杂。所以我在四处寻找，试图找出生成这些图像的正确技术。

我认为这很有可能（虽然不是必须的），我会将HTML文档转换成PDF版本（这样我可以添加页眉、页脚），然后将PDF渲染成TIFF格式，这是影像系统希望的最终格式。因此我考虑的是：

使用某种XHTML转PDF转换软件。问题在于，如果不进行大量评估和测试，我无法确定我查看的产品甚至是否具有我需要的功能，即使用标题和页脚装饰现有的XHTML文档并对其进行分页。
使用XSL-FO生成PDF文件。成为一个忍者级别的XSLT极客在这里是有益的（这是我首先制作XHTML的方法），但这似乎仍然是一个笨拙而缓慢的解决方案，有很多移动的部分。同时，这意味着我将一个笨重的Java程序插入到我漂亮干净的.NET系统中，尽管如果这是正确的答案，我当然已经是一个成熟的人。
使用我甚至还没有想到的其他技术，比如LaTeX。也许存在一些神奇的页面成像工具，可以直接将XHTML转换为带有页眉和页脚的TIFF图片。那就非常理想了。

我的主要关注点是：

我正在开发一款商业产品，所使用的技术必须是经济实惠且易于维护的。并不需要是免费的。
我不想消失在兔子洞里花费三个月的时间，只是为了使其正常工作而咣咣敲打这些东西。直觉上，这似乎是那种问题空间，我可能会浪费很多时间来评估和拒绝工具。
无论我采用什么解决方案，它都需要相对免疫于 XHTML 的格式更改。我使用 XSLT 并首先生成 XHTML 的整个原因是，我生成的文档正在使用不断变化的业务规则动态组装。

我花了很多时间寻找替代方案，但却没有找到显然的答案。但也许你们中的一位已经解决了这个问题，如果是这样，我想站在你的肩膀上。

Answer 1

Edit (2010-11-28 12:30 PM PST) Please +1 this answer if you download my code. I notice my Codeplex sample has been downloaded hundreds of times. The code isn t spectacular, but it works as a great starting point, with lots of links to source help included. Thanks! +汤姆 Edit (2009-03-29 9:00 AM PST) Posted sample conversion.
Edit (2009-03-23 12:30 PM PST, published to CodePlex) I developed a solution for this and posted it to CodePlex. The published version 2.0 is written using the WPF MVVP pattern. TIFF files (one per page) are output to c:TempXhtmlToTiff. XAML and XPS formats are created as well. A compiled,installable version is available at CricketSoft.com

Have you tried the "Microsoft XPS Document Writer"? This a software-only printer that generates paged output from a variety of sources, including web pages.

有一个用于处理XPS文档和Open XML文档的SDK。这是Beth Massi的一篇如何文章：“Accessing Open XML Document Parts with the Open XML SDK”。

+汤姆

Answer 2

我的个人看法是，如果您是一位XSLT忍者，我建议您继续使用它。您可以通过查看nFop来避免使用令人讨厌的JAVA程序，它是apache FOP项目的C#端口。很棒的是，您只需获取该程序集并直接使用它，将您的XML和XSLT传递给它，即可获得所需的PDF输出。

将此翻译成中文：http://sourceforge.net/projects/nfop/ http://sourceforge.net/projects/nfop/

希望能有所帮助。

Answer 3

如果你的目标是TIFF格式，这可能是一种免费且低风险的方法：

Use a component to create an image for a given url. I m not sure which tool we used for it, but GIYF: I just stumbled upon SmallSharpTool s WebPreview that seems to do the job
Make sure it can create an image of the entire page, ie the entire s scrollable area.
Use ImageMagick to do all the image manipulation, such as cutting it into multiple pages, adding your own headers, footers and page numbering and conversion to tiff.

我个人曾单独在 C# 项目（控制台应用程序和网站）中使用上述技术，并取得了成功，因此我几乎可以保证这将起作用。

Answer 4

Use some other technology that I haven t even thought of yet, like LaTeX.

TexML 是具有 XML 语法的 LaTeX 语义。为了使用它，您可以创建 XSLT，用 TexML 命令装饰您的 XHTML（请参见示例）。

Answer 5

你有没有考虑使用PostScript？

附言：您需要什么样的页眉/页脚 - 您自定义的以放置页面之间？如果是这样，后脚或PDF可能是最好的选择。但是，将创建xhtml+css到PDF转换器将非常困难。基本上，您需要使用能够解析xhtml和css（+任何对象，例如图像，闪存等）的库。

Answer 6

PrinceXML is an XHTML/CSS to PDF converter. It seems to have the features you need:

Page headers/footers, page numbering and duplex printing.

I realize you ll probably want more extensive answers than this one (I m sorry, but I haven t evaluated the product), but nevertheless, I hope it helps!

Answer 7

It all depends on how important quality is for the generated documents. It also matters what other operations you need to do with the document.

I m building a desktop application right now that presents its human-readable output as XHTML displayed in a WebBrowser control. Eventually, this output is going to have to be converted from an XHTML file to a document image in an imaging system.

Looks like your application is a soft-form of sorts. You generate filled-in forms and save them.

[...]there need to be headers and footers on these pages.

This is the easy part. You can use templates and merge the data with the static header/footer template. You sound as if you are doing VDP. Hm. Let s move on.

I can t simply make the WebBrowser print to a file - the header/footer options it supports aren t anywhere near sophisticated enough.

Why so? All you need is a capable driver.

It seems likely to me (though it s not mandatory) that what I ll end up doing is producing PDF versions of the HTML documents

Again, it is not clear why you would want PDF right away. PDF is a document interchange format. Not a PDL per se. PostScript is a much better choice. Yes, I know there are things like XPS, PCL and what not. However, the amount of rendering control and quality you get with PS is far too much to risk a cheaper solution. I say cheaper, because, you also need to keep in mind the sort of printing you can avail of. PostScript printers (not the ones with the clone RIPs) are costlier in general.

Now, back to your PDF thing. Yes, of course you can generate PDF. It has certain advantages like:

Better support for transparency (and in general quality)
Archival
Interchange
Share it across for review
Preview/Preflight/Correct
Security
Stream encryption (for both security and the amount of data you transfer to the printer)
Use templates

But remember do you have any printers to do native PDF ripping? Because you are otherwise doing a lossy PDF to PS/PCL conversion. And you ve just lost the game. Which brings me back to PostScript ;)

Answer 8

You can use PISA for Python. It uses the reportlab toolkit to generate a pdf from html (using html5lib)

Answer 9

You could also try using PDFCreator and simply printing the document to PDF. PDFCreator acts like any normal printer and uses ghostscript to convert printer output to pdf, tiff, jpeg, or whatever you want. I think you can change header and footer items through IE s com interface and print directly from IE. PDFCreator has examples for different languages in the com folder of the install directory. I have used it and can vouch for it. Windows only though.

Answer 10

Do you really need to use XHTML/Web browser?

I have been in this exact dilemma trying to generate good looking HTML reports and the solution I found is .... to drop HTML and use a "real" report generator, there are a lot of them out there, they all support all the pagination and header/footer options you can think about they can usually print to pdf and sometimes directly to images.

HTML is just not the right technology for reports.

Answer 11

ExpertPDF HtmlToPdf Converter (www.html-to-pdf.net) should be able to do exactly what you need. It s really simple to use, just reference the assembly in your project and start using it. I ve used this product with great success in a couple of work projects.

Answer 12

You mentioned your current desktop app exports results in xhtml. Since xhtml is well formed xml, you should get away with using xsl fo to export it to pdf.

XML -> XSL-FO = PDF

Here s a beginner s guide: http://www.devx.com/xml/Article/16430

My company has used this technique in a java+cocoon webaplication for the Dutch government.

Answer 13

http://iecapt.sourceforge.net/

quoting from above website:

IECapt is a small command-line utility to capture Internet Explorer s rendering of a web page into a BMP, JPEG or PNG image file. The C++ version also has experimental support for Enhanced Metafile vector graphic output. IECapt is available in a C++ and a C# version.

友情链接