English 中文(简体)
Python Tesseract can t recognize this font
原标题:

I have this image:

alt text

I want to read it to a string using python, which I didn t think would be that hard. I came upon tesseract, and then a wrapper for python scripts using tesseract.

So I started reading images, and it s done great until I tried to read this one. Am i going to have to train it to read that specific font? Any ideas on what that specific font is? Or is there a better ocr engine I could use with python to get this job done.

Edit: Perhaps I could make some sort of vector around the numbers, then redraw them in a larger size? The larger images are the better tesseract ocr seems to read them (no surprise lol).

最佳回答

Just train the engine for the 10 digits and a . . That should do it. And make sure you change your image to grayscale before OCRing it.

问题回答

Training is hard and is not what is really needed here. The distinction between O and 0 and l and 1 are going to be hard, no matter the script. Limiting the OCR to choose only between numerical digits greatly simplifies the problem, if the context permits it.

My interest in tesseract is in processing lots of numbers, from old government reports. In this case and in the case in question, the character set will be something like 0123456789. Following a comment in the old (sourceforge) newsgroup for tesseract, by eric_taj on 2007-03-21, you can modify Templates->IndexFor and Templates->ClassIdFor in classify/intproto.cpp to mask off characters which are not to be allowed. I modified that approach a bit to read in the allowed character set at runtime in an environment variable, so that I can adjust the permitted set on the fly.

There has been a lot of traffic on this topic in the tesseract OCR discussion group lately. You will need to use a "language" of just numbers. Many people have trained the engine that way before. It looks like you re trying to outwit a captcha data protection scheme... tsk, tsk.

That looks like Eurostile font. Yes, you will have to train with each different font that is being used in your source images.

Recognizing small screen font may be hard for the general-purpose OCR which is optimized for reading large smooth font scanned from paper.

You may better try special screenshot OCR like Textract SDK. It will collect all local fonts and provide 100% precise recognition by simply matching character to character.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签