It seems like with the current progress in ML models, doing OCR should be an easy task. After all, recognizing handwritten numbers was one of the prime benchmarks for image recognition (MNIST was released in 1994).
Yet, when I try to OCR any of my handwritten notes all I ever get is a jumbled mess of nonsense. Am I missing something, is my handwriting really that atrocious or is it the models?
Here’s a quick example, a random passage from a scientific article: Image
I tried EasyOCR, Tesseract, PPOCR and a few online tools. Only PPOCR was able to correctly identify the numbers and the words “J.” and “Chem.”. The rest is just a random mess of characters.
CharlesRivard@lemmy.world 1 week ago
Handwritten OCR is still pretty bad cuz people write super differently. Printed text follows clear patterns, but handwriting can be messy, tilted, or rushed. I noticed most OCR tools completely fail on notes from lectures. One thing that actually helped me tho was Clever Humanizer Grammar Checker. After OCR messes up words or grammar, I paste the text there and it fixes everything really fast. Makes the final text way more readable and natural tbh.