English 中文(简体)
摘录用纸张中的文字形象
原标题:Extracting image of words from a scanned paper

I want get a small image of every word in a lot of scanned books (that is in Persian (Arabic-script)). I have no experiment in image prossessing.
How can I do that in most efficient way?

问题回答

I suggest you write a script in MATLAB something like this.
a : half of the maximum distance between the letters.(in pixels)
b : half of the minimum distance between the words.(in pixels)
(lets hope a < b )

维护该网页的扫描图像。

I(I < Th) = 0;I(I > Th) = 1;  

Choose Th by experimenting. You should get a binary image I having 1 s where letters are. Dilate the image.

imdilate(I,a); 

This will connect the letters together.
Remove noise.

I = bwareaopen(I,n); 

this will remove all connected components with less that n pixels.
Do connected component analysis.

CC = bwconncomp(I);  
Rect = regionprops(I, BoundingBox );  

This will return a list of co-ordinates of a rectangle containing a single word. Extract the sub-matrix from original copy and write the image using imwrite().





相关问题
Resources for Image Recognition

I am looking for a recommendation for an introduction to image processing algorithms (face and shape recognition, etc.) and wondered if anyone had an good recommendations, either for books, ...

Good reference book for digital image processing? [closed]

I am learning digital image processing on my own and would like recomendations on good reference books. If you know of books to definately stay away from that would be useful as well. Thanks

Python Tesseract can t recognize this font

I have this image: I want to read it to a string using python, which I didn t think would be that hard. I came upon tesseract, and then a wrapper for python scripts using tesseract. So I started ...

What s the quickest way to parallelize code?

I have an image processing routine that I believe could be made very parallel very quickly. Each pixel needs to have roughly 2k operations done on it in a way that doesn t depend on the operations ...

Computing object statistics from the second central moments

I m currently working on writing a version of the MATLAB RegionProps function for GNU Octave. I have most of it implemented, but I m still struggling with the implementation of a few parts. I had ...

Viola-Jones face detection claims 180k features

I ve been implementing an adaptation of Viola-Jones face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular ...

热门标签