English 中文(简体)
extract text from word or pdf based on format (font name and size)
原标题:

I need to parse large text (about 1000 pages of word or pdf document)and place some of the text inside this document into database fields

I found that the only thing I can distinguish the text I want to extract is the format , it is always "Helvetica-Condensed" size 12

can I do that ? I know how to use the string functions but what I should use to test the format ?

as I said the text is stored inside word document or PDF

if there is third party component can do no problem please refer it to me

Thanks

最佳回答

There is QuickPDF. The price is $249,00.

问题回答

The other option is to code it yourself. The file specification is available online, and if your only trying to rip the text out of the document this should guide you most of the way.

The only thing to be careful of are documents which are built entirely from images. In that scenario (no matter what you use to read the file) you will also need an OCR type of application. To see if this is the case or not, open a sample of the type of file you are wanting to "extract" text from, select the text to copy then try to paste into notepad.





相关问题
determining the character set to use

my delphi 2009 app has a basic translation system that uses GNUGetText. i had used some win API calls to prepare the fonts. i thought it was working correctly until recently when someone from Malta ...

Help with strange Delphi 5 IDE problems

Ok, I m going nuts here. For the last (almost) four years, I ve been putting up with some extremely bad behavior from my Delphi 5 IDE. Problems include: Seemingly random errors in coride50.bpl ...

How to write a Remote DataModule to run on a linux server?

i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...

How convert string to integer in Oxygene

In Delphi, there is a function StrToInt() that converts a string to an integer value; there is also IntToStr(), which does the reverse. These functions doesn t appear to be part of Oxygene, and I can ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签