Search
The key advantage of applying OCR to your scanned documents is the ability to search the text. Searching is made simple with Paperwork.
There’s advanced search functionality too. You can choose to search with keyword(s), by label, and/or by date. There’s the ability to apply multiple searches, as shown in the image below. The two basic Boolean search commands AND and OR are supported.
You can define the search from and to date, and also apply a NOT operator to any search.
Labels
Labels offer a simple way to organize your documents.
Clicking the 4 horizontal bar graphic at the top right of any document brings up a dialog box.
The dialog lets you define the date of the document, set one or more user-definable labels, and specify any additional keywords.
The image to the left shows some example labels applied to the scanned documents. The color coded labels help you quickly identify documents.
The software automatically guesses the labels to apply to new documents. This functionality is courtesy of Simplebayes, a memory-based, optional-persistence naïve Bayesian text classifier.
The additional keywords option can be useful if character recognition doesn’t work.
Labels are effective, quick to apply, and work well. You can use the search functionality to filter documents by a label i.e. matching a particular label, or with the NOT operator, disregarding a specific label.
Next page: Page 4 – Other Features
Pages in this article:
Page 1 – Introduction / Installation
Page 2 – In Operation
Page 3 – Search / Labels
Page 4 – Other Features
Page 5 – Summary
Complete list of articles in this series:
Excellent Utilities | |
---|---|
AES Crypt | Encrypt files using the Advanced Encryption Standard |
Ananicy | Shell daemon created to manage processes’ IO and CPU priorities |
broot | Next gen tree explorer and customizable launcher |
Cerebro | Fast application launcher |
cheat.sh | Community driven unified cheat sheet |
CopyQ | Advanced clipboard manager |
croc | Securely transfer files and folders from the command-line |
Deskreen | Live streaming your desktop to a web browser |
duf | Disk usage utility with more polished presentation than the classic df |
eza | A turbo-charged alternative to the venerable ls command |
Extension Manager | Browse, install and manage GNOME Shell Extensions |
fd | Wonderful alternative to the venerable find |
fkill | Kill processes quick and easy |
fontpreview | Quickly search and preview fonts |
horcrux | File splitter with encryption and redundancy |
Kooha | Simple screen recorder |
KOReader | Document viewer for a wide variety of file formats |
Imagine | A simple yet effective image optimization tool |
LanguageTool | Style and grammar checker for 30+ languages |
Liquid Prompt | Adaptive prompt for Bash & Zsh |
lnav | Advanced log file viewer for the small-scale; great for troubleshooting |
lsd | Like exa, lsd is a turbo-charged alternative to ls |
Mark Text | Simple and elegant Markdown editor |
McFly | Navigate through your bash shell history |
mdless | Formatted and highlighted view of Markdown files |
navi | Interactive cheatsheet tool |
noti | Monitors a command or process and triggers a notification |
Nushell | Flexible cross-platform shell with a modern feel |
nvitop | GPU process management for NVIDIA graphics cards |
OCRmyPDF | Add OCR text layer to scanned PDFs |
Oh My Zsh | Framework to manage your Zsh configuration |
Paperwork | Designed to simplify the management of your paperwork |
pastel | Generate, analyze, convert and manipulate colors |
PDF Mix Tool | Perform common editing operations on PDF files |
peco | Simple interactive filtering tool that's remarkably useful |
ripgrep | Recursively search directories for a regex pattern |
Rnote | Sketch and take handwritten notes |
scrcpy | Display and control Android devices |
Sticky | Simulates the traditional “sticky note” style stationery on your desktop |
tldr | Simplified and community-driven man pages |
tmux | A terminal multiplexer that offers a massive boost to your workflow |
Tusk | An unofficial Evernote client with bags of potential |
Ulauncher | Sublime application launcher |
Watson | Track the time spent on projects |
Whoogle Search | Self-hosted and privacy-focused metasearch engine |
Zellij | Terminal workspace with batteries included |
It seems like such a good idea, but on my Ryzen 2700X with GeForce GTX 1080 Ti, it is impossibly slow on documents of a few hundred pages. I can’t get the cut and paste to work either.
Have you raised the issues upstream?
Yes, and Jerome (author) says it’s a known issue for large pdf files.
More likely to be an issue with Cairo.
Smooth GUI, but not intuitive and most of the time it is not clear what it is doing. I cannot tell when OCR was successful, little to no progress indication on most actions. It has a lot of potential, but also a lot of potential for improvement.