Projects / tesseract-ocr


tesseract-ocr is an OCR engine originally developed by Hewlett Packard and now sponsored by Google. It is highly accurate and will read a binary, gray, or color image and output text.

Operating Systems

Recent releases

  •  03 Nov 2012 15:57

    Release Notes: This release adds a C API, a new solution for VS (2008), right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, paragraph detection in layout analysis/post OCR, fixes for inconsistent xheight during training and over-chopping, simultaneous multi-language capability, a refactored top-level word recognition module, an experimental equation detector, improved handling of resolution from input images, and a blamer module for error analysis. It cleans an externally-used namespace by removing includes from baseapi.h.

    •  31 Oct 2011 09:46

      Release Notes: This release adds thread safety, a recognizer for Arabic, PageIterator and ResultIterator, and more.

      •  02 Oct 2010 19:22

        Release Notes: Preparations were made for thread safety. A major new page layout analysis module was added. HOCR output was added. Many more languages were added. Most of the function header comments were documented with doxygen. Leptonica was added for main image I/O and handling.

        Recent comments

        08 Mar 2011 10:57 Teiman

        Its easy to use, and has a good quality of recognition. I recommend it over other similar engines.


        Project Spotlight


        A Fluent OpenStack client API for Java.


        Project Spotlight

        TurnKey TWiki Appliance

        A TWiki appliance that is easy to use and lightweight.