The big complicated segmentation pipeline is a legacy from the time you had to do that, a few years ago. It's error prone, and even at it's best it robs the model of valuable context. You need that context if you want to take the step to handwriting. If you go to a group of human experts to help you decipher historical handwriting, the first thing they will tell you is that they need the whole document for context, not just the line or word you're interested in.
We need to do end to end text recognition. Not "character recognition", it's not the characters we care about. Evaluating models with CER is also a bad idea. It frustrates me so much that text recognition is remaking all the mistakes of machine translation from 15+ years ago.
> OCR4all is a software which is primarily geared towards the digital text recovery and recognition of early modern prints, whose elaborate printing types and mostly uneven layout challenge the abilities of most standard text recognition software.
Looks like a great project, and I don't want to nitpick, but...
https://www.ocr4all.org/about/ocr4all
> Due to its comprehensible and intuitive handling OCR4all explicitly addresses the needs of non-technical users.
A little secret: Apple’s Vision Framework has an absurdly fast text recognition library with accuracy that beats Tesseract. It consumes almost any image format you can think of including PDFs.
> How is this different from tesseract and friends?
The workflow is for digitizing historical printed documents. Think conserving old announcements in blackletter typesetting, not extracting info from typewritten business documents.
I'm sorry. I suppose this is great but, an .exe-File is designed for usability. A docker container may be nice for techy people, but it is not "4all" this way and I do understand that the usability starts after you've gone through all the command line interface parts, but those are just extra steps compared to other OCR programs which work out of the box.
I think the current sweet-spot for speed/efficiency/accuracy is to use Tesseract in combination with an LLM to fix any errors and to improve formatting, as in my open source project which has been shared before as a Show HN:
This process also makes it extremely easy to tweak/customize simply by editing the English language prompt texts to prioritize aspects specific to your set of input documents.
What is this? A new SOTA OCR engine (which would be very interesting to me) or just a tool that uses other known engines (which would be much less interesting to me).
A movement? A socio-political statement?
If only landing pages could be clearer about wtf it actually is ...
„OCR4all combines various open-source solutions to provide a fully automated workflow for automatic text recognition of historical printed (OCR) and handwritten (HTR) material.“
It seems to be based on OCR-D, which itself is based on
Ocr is well and good, i thought it was mostly solved with tesseract what does this bring? But, what I’m looking for is a reasonable library or usable implementation of MRC compression for the resulting pdfs. Nothing i have tried comes anywhere near the commercial offerings available, which cost $$$$ . It seems to be a tricky problem to solve, that is detecting and separating the layers of the image to compress separately and then binding them
Back togethr into a compatible pdf.
Wow. Setup took 12 GB of my disk. First impression: nice UI, but no idea what to do with it or how to create a project. Tells me "session expired" no matter what I try to do. Definitely not batteries-included kind of stuff, will need to explore later.
I've been looking for a project that would have an easy free/extremely cheap way to do OCR/image recognition for generating ALT text automatically for social media. Some sort of embedded implementation that looks at an image and is either able to transcribe the text, or (preferably) transcribe the text AND do some brief image recognition.
I generally do this manually with Claude and it's able to do it lightning fast, but a small dev making a third party Bluesky/Mastodon/etc client doesn't have the resources to pay for an AI API.
They lost me when they suggested I install docker.
Now, I wouldn't mind if they suggested that as an _option_ for people whose system might exhibit compatibility problems, but - come on! How lazy can you get? You can't be bothered to cater to anything other than your own development environment, which you want us to reproduce? Then maybe call yourself "OCR4me", not "OCR4all".
I don't wish to speak out of turn, but it doesn't look like this project has been active for about 1 year. I checked GitHub and the last update was in Feb 2024. Their last post to X was 25 OCT 2023. :(
This looks promising, not sure how it stacks up to Transkribus which seems to be the leader in the space since it has support for handwritten and trainable ML for your dataset.
I've been using tesseract for a few years on a personal project, I'd be interested to know how they compare in terms of system resources, given that I am running it on a dell optiplex micro with 8 gigs of ram and 6-th gen i5 - tesseract is barely noticeable so it's just my curiosity at this point, I don't have any reasons to even consider switching over. I do however have a large dataset of several hundred gbs of scanned pdfs which would be worth digitalizing when I find some time to spare.
OCR4all
(ocr4all.org)434 points by LorenDB 14 February 2025 | 124 comments
Comments
We need to do end to end text recognition. Not "character recognition", it's not the characters we care about. Evaluating models with CER is also a bad idea. It frustrates me so much that text recognition is remaking all the mistakes of machine translation from 15+ years ago.
> OCR4all is a software which is primarily geared towards the digital text recovery and recognition of early modern prints, whose elaborate printing types and mostly uneven layout challenge the abilities of most standard text recognition software.
Looks like it's built on https://github.com/Calamari-OCR/calamari
https://www.ocr4all.org/about/ocr4all > Due to its comprehensible and intuitive handling OCR4all explicitly addresses the needs of non-technical users.
https://www.ocr4all.org/guide/setup-guide/quickstart > Quickstart > Open a terminal of your choice and enter the following command if you're running Linux (followed by a 6 line docker command).
How is that addressing the needs of non-technical users?
I wrote a simple CLI tool and more featured Python wrapper for it: https://github.com/fny/swiftocr
The workflow is for digitizing historical printed documents. Think conserving old announcements in blackletter typesetting, not extracting info from typewritten business documents.
It combines Tesseract (for images) and Poppler-utils (PDF). A local open-source LLMs will extract document segments intelligently.
It can also be extended to use one or multiple Vision LLM models easily.
And finally, it outputs the entire AI agent API into a Dockerized container.
Create complex OCR workflows through the UI without the need of interacting with code or command line interfaces.
[...] https://www.ocr4all.org/guide/setup-guide/windows
------------------
I'm sorry. I suppose this is great but, an .exe-File is designed for usability. A docker container may be nice for techy people, but it is not "4all" this way and I do understand that the usability starts after you've gone through all the command line interface parts, but those are just extra steps compared to other OCR programs which work out of the box.
https://github.com/Dicklesworthstone/llm_aided_ocr
This process also makes it extremely easy to tweak/customize simply by editing the English language prompt texts to prioritize aspects specific to your set of input documents.
A movement? A socio-political statement?
If only landing pages could be clearer about wtf it actually is ...
It seems to be based on OCR-D, which itself is based on
- https://github.com/tesseract-ocr/tesseract
- https://kraken.re/main/index.html
- https://github.com/ocropus-archive/DUP-ocropy
- https://github.com/Calamari-OCR/calamari
See
- https://ocr-d.de/en/models
It seems to be an open-source alternative to https://www.transkribus.org/ ( which uses amongst others https://atr.pages.teklia.com/pylaia/pylaia/ )
Another alternative is https://escriptorium.inria.fr/ ( which uses kraken)
I generally do this manually with Claude and it's able to do it lightning fast, but a small dev making a third party Bluesky/Mastodon/etc client doesn't have the resources to pay for an AI API.
Now, I wouldn't mind if they suggested that as an _option_ for people whose system might exhibit compatibility problems, but - come on! How lazy can you get? You can't be bothered to cater to anything other than your own development environment, which you want us to reproduce? Then maybe call yourself "OCR4me", not "OCR4all".
(It looks like the project started in 2022. So maybe it wasn't obvious at the time)