English 中文(简体)
PDF compression How does Adobe do it?
原标题:

This is a bit more of a fun question than a serious one, but how does the Adobe PDF format make documents so... portable?

I just created a small Word document, 235kb in size, containing multiple color photos and a few textual phrases. A PDF created using CutePDF (which I understand isn t the most efficient method of PDF creation) is only 176kb. That s a 25% compression ratio. When those files are placed into a compressed folder, the PDF is capable of 3% compression where the .docx can only take 2%. I m sure that larger files would have even greater differences in size.

My question is, how does Adobe manage to make their files so much smaller? I understand that they are drawn from raster graphics, but my 3 bitmap files really can t be helped from raster that much, can they?

最佳回答

If you have Acrobat 9 there is a nice tool built-in so you can see how the PDF was put together (and compressions used). There is a blog post explaining how to use it at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

问题回答

There are a few ways it can be compressing this:

  1. Pdf files use lzw and zip compression.

  2. If the image is scaled in the document, or is a larger dpi on disk than you allow for in cutepdf (for example, if cutepdf is set for 300dpi and the image is 600 dpi), it can be scaled in the pdf.

  3. Microsoft stores TONS of info in the docx format, in xml. WAY more than is really needed to just export the info (for an example, try copying and pasting your text into a textbox cell, and look at the html info that comes out - I had a limit on a textbox size for a cms, and a 7 word sentence ballooned to 950 characters). This is so it can be later edited, and with a lot of esoteric info to make sure everything displays right in every possible permutation. The pdf doesn t need that info, and so it can just do the font and size, and strip out all the unnecessary info, saving a ton of space.

When you use such small files any overhead in the document format will have a disproportionate effect which is why you are seeing such large % differences.

I took a 2683KB JPEG and inserted it into a new word 2003 document. The resulting .doc file was 2725KB (or 2697KB as docx). Turning this into a PDF gives me a 2701KB PDF. So I am seeing a difference of 25KB, but only about 1% difference because of the size of the image data. It is about half what you got but maybe the version of word you have is more verbose when making docx?

For the PDF, acrobat shows space usage as 2691K image, 8.27K overhead and 1K fonts. PDF is quite a sparse format in its syntax which limits overhead and much of it has repeating strings so is easily compressible.

If you want to see what the PDF contains in a tree-like view you can download the demo version of CosEdit.





相关问题
How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Extract everything from PDF [closed]

Looking for solution to extract content from a PDF file (using console tool or a library). It will be used on server to produce on-line e-books from uploaded PDF files. Need to extract following ...

PDF compression How does Adobe do it?

This is a bit more of a fun question than a serious one, but how does the Adobe PDF format make documents so... portable? I just created a small Word document, 235kb in size, containing multiple ...

Exporting HTML Tables

How can I export an HTML table in my page as PDF and/or XLS? (preferably using JS (+jquery))

Using Java PDFBox library to write Russian PDF

I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It ...

HTML --> PDF with PHP [duplicate]

Possible Duplicate: Convert HTML + CSS to PDF with PHP? How can I convert an HTML page (via $cURL or something) to a PDF file?

Free way to convert PDF to XPS with C#

Are there any free tools that I can use to convert a PDF document into an XPS document? Although a nice programmatic API would be nice, I m not opposed to shelling out to a command line tool to do ...

PDF conversion service

I need to develop a service able to convert MS Office and Open Office documents to PDF. And the PDF`s also need to be commentable when opened in ADOBE Reader. I have used a piece of software from www....

热门标签