English 中文(简体)
Any way to create a PDF so the text can t be copied/extracted back out?
原标题:

I m trying to help create a neighborhood directory and I want to discourage someone from harvesting contact info (especially email addresses) from that.

Is there any easy way to prevent someone from copying and pasting that text from the PDF?

Update Goal here is to make the PDF no easier to harvest email addresses from than the current paper directory, and to make the PDF directory as useful as the paper directory. The online pdf directory will have advantages such as always being up to date and saving some printing costs (or passing those costs on to folks who want to print the document).

最佳回答

The other answers are a good start. However, I found out exactly how to lock the PDF to prevent copying.

You can use Primo PDF s free pdf driver and change the Security settings per: http://www.primopdf.com/help/tip_secure_pdf.aspx

To add password security to your PDF, read on to learn how you can do it free with PrimoPDF.

  1. Download and install the free PDF driver: http://www.primopdf.com/download.aspx
  2. Open the file to convert to PDF
  3. Open the Print dialog (or press Ctrl+P)
  4. In the printer list, choose PrimoPDF
  5. Click Print
  6. On the PrimoPDF dialog, click the Change button next to the Security label to open the security dialog.
  7. Enter your Open password twice.
  8. Optionally, enter a Permissions password and choose the functionality you want to restrict.
  9. Click OK.
  10. Click Create PDF.

Final Tip. If you want to apply security to all the PDF files you create, you can do it easily by correctly configuring PrimoPDF. At the bottom of the dialog (see above), just make sure the Always use these settings option is turned on.

问题回答

If the data is to be readable, which I d assume is your goal, there is no way you can stop a dedicated person from taking it and using it. Converting to an image will make it difficult, but anyone with good OCR or a team of cheap foreign labor can get anything they want out of it. If the data is super sensitive and you are worried about it, you should really reconsider the value of publishing it.

Using an image instead of text makes it a lot more difficult to automatically grab data from a PDF.

Part of one of my previous jobs included reformatting data in PDFs to a (specific) more structured document format, and when we got PDFs whose text was images -- let alone blurry or hard to read images -- the OCR would be riddled with wrong letters, and we d have to go in by hand and fix most everything.

PDF allows for locking the document (source text will be encrypted, but readable), so the properties won t allow reader to print or copy from it.

Anyway, I would discourage this use as it is pain in the ass to use such PDF. Personally, I would recommend you to look for other methods than actively making your document readers angry.

PS: Harvesting emails from PDF is virtually unheard of.

Another possible solutions could be the following:

  1. Convert text to vectors (some open source tools can do this) so the PDF file will still maintain small size comparing to having images inside pdf.
  2. Hack the PDF to damage internal font indexes to unicode symbols map so the copied text will be copied as the rubbish (as pdf reader app will not be able to find proper mapping from images to their character values).

Disclaimer: I work for ByteScout, the vendor of PDF Extractor SDK tool that can be used to restore the text from all possible damages from PDF files like these so actually if someone really wants to restore text from pdf then it can be done anyway (with less or more errors though).





相关问题
How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Extract everything from PDF [closed]

Looking for solution to extract content from a PDF file (using console tool or a library). It will be used on server to produce on-line e-books from uploaded PDF files. Need to extract following ...

PDF compression How does Adobe do it?

This is a bit more of a fun question than a serious one, but how does the Adobe PDF format make documents so... portable? I just created a small Word document, 235kb in size, containing multiple ...

Exporting HTML Tables

How can I export an HTML table in my page as PDF and/or XLS? (preferably using JS (+jquery))

Using Java PDFBox library to write Russian PDF

I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It ...

HTML --> PDF with PHP [duplicate]

Possible Duplicate: Convert HTML + CSS to PDF with PHP? How can I convert an HTML page (via $cURL or something) to a PDF file?

Free way to convert PDF to XPS with C#

Are there any free tools that I can use to convert a PDF document into an XPS document? Although a nice programmatic API would be nice, I m not opposed to shelling out to a command line tool to do ...

PDF conversion service

I need to develop a service able to convert MS Office and Open Office documents to PDF. And the PDF`s also need to be commentable when opened in ADOBE Reader. I have used a piece of software from www....

热门标签