English 中文(简体)
Using Ruby And Ubuntu With Optical Character Recognition
原标题:

I am a university student and it s time to buy textbooks again. This quarter there are over 20 books I need for classes. Normally this wouldn t be such a big deal, as I would just copy and paste the ISBNs into Amazon. The ISBNs, however, are converted into an image on my school s book site. All I want to do is get the ISBNs into a string so I don t have to type each one by hand. I have used GOCR to convert the images into text, but I want to use it with a Ruby script so I can automate the process and do the same for my classmates.

I can navigate to the site. How can I save the image to a file on my computer (running UBUNTU), convert the image with GOCR, and finally save it to a file so I can then access them again with my Ruby script?

问题回答

GOCR seems to be a good choice at first, but from what I can tell from my own "research", quality isn t quite sufficient for daily use. Maybe this could lead to a problem, depending on the image input. If it doesn t work out for you, try the "new" feature of Google Docs, which allows you to upload images for OCR. You can then retrieve the results using some google api ( there are tons out there, I m using gdata-ruby-util which requires some hacking, though.

You could also use tesseract-ocr for the OCR part, it s also open source and in active development.

For the retrieval part, I would as well stick with hpricot, super-powerful and flexible.

Sounds like a cool project, and shouldn t be too hard if the ISBN images are stored in individual files.

This all can be run in the background:

  • download web page (net/http)
  • save metadata + image file for each book (paperclip)
  • run GOCR on all the images

All you need is a list of urls or a crawler (mechanize) and then you probably need to spend a few minutes writing a parser (see joe s post) for the university html pages.





相关问题
rails collection_select vs. select

collection_select and select Rails helpers: Which one should I use? I can t see a difference in both ways. Both helpers take a collection and generates options tags inside a select tag. Is there a ...

SSL slowness in EC2

We ve deployed our rails app to EC2. In our setup, we have two proxies on small instances behind round-robin DNS. These run nginx load balancers for a dynamically growing and shrinking farm of web ...

Auth-code with A-Za-z0-9 to use in an URL parameter

As part of a web application I need an auth-code to pass as a URL parameter. I am currently using (in Rails) : Digest::SHA1.hexdigest((object_id + rand(255)).to_s) Which provides long strings like : ...

RubyCAS-Client question: Rails

I ve installed RubyCAS-Client version 2.1.0 as a plugin within a rails app. It s working, but I d like to remove the ?ticket= in the url. Is this possible?

activerecord has_many :through find with one sql call

I have a these 3 models: class User < ActiveRecord::Base has_many :permissions, :dependent => :destroy has_many :roles, :through => :permissions end class Permission < ActiveRecord::...

Ordering a hash to xml: Rails

I m building an xml document from a hash. The xml attributes need to be in order. How can this be accomplished? hash.to_xml

Text Editor for Ruby-on-Rails

guys which text editor is good for Rubyonrails? i m using Windows and i was using E-Texteditor but its not free n its expired now can anyone plese tell me any free texteditor? n which one is best an ...

How to get SQL queries for each user where env is production

I’m developing an application dedicated to generate statistical reports, I would like that user after saving their stat report they save sql queries too. To do that I wrote the following module: ...

热门标签