Ocrad – OCR (Optical Character Recognition) software based on a feature extraction method

Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads images in pbm (bitmap), pgm (greyscale) or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats.

The software also includes a layout analyser able to separate the columns or blocks of text normally found on printed pages.

Ocrad can be used as a standalone console application, or as a backend to other software. It’s mainly for research purposes.

Ocrad recognizes characters by its shape, and the reason it is so fast is that it does not compare the shape of every character against some sort of database of shapes and then chooses the best match. Instead of this, Ocrad only compares the shape differences that are relevant to choose between two character categories, mostly like a binary search.

Key Features

  • Uses the ISO 10646 character set internally which can represent over 2 thousand million characters.
  • Pass text through filters. Ocrad provides both built-in filters and user-defined filters. Built-in filters are:
    • –filter=letters – Forces every character that resembles a letter to be recognized as a letter. Other characters will be output without change.
    • –filter=letters_only. This is the same as –filter=letters, but other characters will be discarded.
    • –filter=numbers. This forces every character that resembles a number to be recognized as a number. Other characters will be output without change.
    • –filter=numbers_only. This is the same as –filter=numbers but other characters will be discarded.
    • –filter=same_height. This discards any character (or noise) whose height differs in more than 10 percent from the median height of the characters in the line.
    • –filter=text_block. This discards any character (or noise) outside of a rectangular block of text lines.
    • –filter=upper_num. This forces every character that resembles a uppercase letter or a number to be recognized as such. Other characters will be output without change.
    • –filter=upper_num_mark. This is the same as –filter=upper_num’, but other characters will be marked as unrecognized.
    • –filter=upper_num_only. This is the same as –filter=upper_num’, but other characters will be discarded.
  • Cross-platform support – runs under Linux, FreeBSD, NetBSD, OpenBSD, and Mac OS X.

Website: www.gnu.org/software/ocrad
Support: Manual
Developer: Antonio Diaz Diaz
License: GNU GPL v2 or any later version

Ocrad is written in C++. Learn C++ with our recommended free books and free tutorials.


Related Software

OCR Systems
TesseractHigh quality neural net (LSTM) based OCR engine focused on line recognition
EasyOCROCR that reads natural scene text and dense text in documents
ocrsModern OCR engine
SuryaMultilingual document OCR toolkit with text recognition
ocropyOpen source document analysis and OCR system
OcradOCR engine based on a feature extraction method
CuneiformOCR Engine to convert OCR documents into editable form
GOCRReads images in many formats

Read our verdict in the software roundup.

OCR Tools
OCRmyPDFAdds an OCR text layer to scanned PDFs using the unpaper utility
PaperworkSimplify the management of your paperwork
OCRFeederDesktop OCR suite featuring a complete GTK graphical user interface
ocropyOpen source document analysis and OCR system
gImageReaderSimple Gtk/Qt front-end to Tesseract
gscan2pdfGUI to produce PDFs or DjVus from scanned documents
lioslinux-intelligent-ocr-solution for converting print into text
hocr-toolsManipulate and evaluate hOCR format
SkanpageSimple scanning application optimized for multi-page document scanning
GOCRReads images in many formats

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments