Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method. It reads images in pbm (bitmap), pgm (greyscale) or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats.
The software also includes a layout analyser able to separate the columns or blocks of text normally found on printed pages.
Ocrad can be used as a standalone console application, or as a backend to other software. It’s mainly for research purposes.
Ocrad recognizes characters by its shape, and the reason it is so fast is that it does not compare the shape of every character against some sort of database of shapes and then chooses the best match. Instead of this, Ocrad only compares the shape differences that are relevant to choose between two character categories, mostly like a binary search.
Key Features
- Uses the ISO 10646 character set internally which can represent over 2 thousand million characters.
- Pass text through filters. Ocrad provides both built-in filters and user-defined filters. Built-in filters are:
- –filter=letters – Forces every character that resembles a letter to be recognized as a letter. Other characters will be output without change.
- –filter=letters_only. This is the same as –filter=letters, but other characters will be discarded.
- –filter=numbers. This forces every character that resembles a number to be recognized as a number. Other characters will be output without change.
- –filter=numbers_only. This is the same as –filter=numbers but other characters will be discarded.
- –filter=same_height. This discards any character (or noise) whose height differs in more than 10 percent from the median height of the characters in the line.
- –filter=text_block. This discards any character (or noise) outside of a rectangular block of text lines.
- –filter=upper_num. This forces every character that resembles a uppercase letter or a number to be recognized as such. Other characters will be output without change.
- –filter=upper_num_mark. This is the same as –filter=upper_num’, but other characters will be marked as unrecognized.
- –filter=upper_num_only. This is the same as –filter=upper_num’, but other characters will be discarded.
- Cross-platform support – runs under Linux, FreeBSD, NetBSD, OpenBSD, and Mac OS X.
Website: www.gnu.org/software/ocrad
Support: Manual
Developer: Antonio Diaz Diaz
License: GNU GPL v2 or any later version
Ocrad is written in C++. Learn C++ with our recommended free books and free tutorials.
Related Software
| OCR Systems | |
|---|---|
| Tesseract | High quality neural net (LSTM) based OCR engine focused on line recognition |
| EasyOCR | OCR that reads natural scene text and dense text in documents |
| ocrs | Modern OCR engine |
| Surya | Multilingual document OCR toolkit with text recognition |
| ocropy | Open source document analysis and OCR system |
| Ocrad | OCR engine based on a feature extraction method |
| Cuneiform | OCR Engine to convert OCR documents into editable form |
| GOCR | Reads images in many formats |
Read our verdict in the software roundup.
| OCR Tools | |
|---|---|
| OCRmyPDF | Adds an OCR text layer to scanned PDFs using the unpaper utility |
| Paperwork | Simplify the management of your paperwork |
| OCRFeeder | Desktop OCR suite featuring a complete GTK graphical user interface |
| ocropy | Open source document analysis and OCR system |
| gImageReader | Simple Gtk/Qt front-end to Tesseract |
| gscan2pdf | GUI to produce PDFs or DjVus from scanned documents |
| lios | linux-intelligent-ocr-solution for converting print into text |
| hocr-tools | Manipulate and evaluate hOCR format |
| Skanpage | Simple scanning application optimized for multi-page document scanning |
| GOCR | Reads images in many formats |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

