There are at least four different ways to get text into a PDF document (in order or likelihood):
- Place the text with standard text operators and standard fonts
- Place the text with standard text operators with non-standard fonts
- Draw one or more images that represent the text
- Place the text by manually drawing the glyphs with various PDF graphics commands
Case 1 is typically searchable.
Case 2 is searchable if the font and encoding are sane - if they re not (and this is likely the case for non-Latin fonts) then there is probably no reliable way to map the encoded glyphs back to Unicode (and by the way - PDF is fairly Unicode hostile).
Case 3 is totally unsearchable without knowing more about how the PDF was generated.
Case 4 is totally unsearchable.
That said, all cases cases be read with an OCR engine that understands Arabic. I understand that the Iris engine does Arabic.