Question

Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it s on-topic for Stack Overflow.

Closed 13 years ago.

I have a pdf document with content in Arabic language and when I try to search inside the document for a specific word, adobe reader returns no results.

it seems a format problem... how can I fix that? thanks.

Answer 1

There are at least four different ways to get text into a PDF document (in order or likelihood):

Place the text with standard text operators and standard fonts
Place the text with standard text operators with non-standard fonts
Draw one or more images that represent the text
Place the text by manually drawing the glyphs with various PDF graphics commands

Case 1 is typically searchable. Case 2 is searchable if the font and encoding are sane - if they re not (and this is likely the case for non-Latin fonts) then there is probably no reliable way to map the encoded glyphs back to Unicode (and by the way - PDF is fairly Unicode hostile). Case 3 is totally unsearchable without knowing more about how the PDF was generated. Case 4 is totally unsearchable.

That said, all cases cases be read with an OCR engine that understands Arabic. I understand that the Iris engine does Arabic.

Answer 2

It might not actually be text, or it might be in a container that Reader doesn t pay attention to. It s especially common to expand text objects into vector shapes when you re dealing with fonts that most people aren t going to have installed on their system. It looks the same on the screen, but it s not searchable.

友情链接