Why is OCR for handwritten content still that bad?

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨hinterlufer@lemmy.world⁩ to ⁨[deleted]⁩

It seems like with the current progress in ML models, doing OCR should be an easy task. After all, recognizing handwritten numbers was one of the prime benchmarks for image recognition (MNIST was released in 1994).

Yet, when I try to OCR any of my handwritten notes all I ever get is a jumbled mess of nonsense. Am I missing something, is my handwriting really that atrocious or is it the models?

Here’s a quick example, a random passage from a scientific article: Image

I tried EasyOCR, Tesseract, PPOCR and a few online tools. Only PPOCR was able to correctly identify the numbers and the words “J.” and “Chem.”. The rest is just a random mess of characters.

source

Comments

Sort:hotnew top

ocean@lemmy.selfhostcat.com ⁨1⁩ ⁨year⁩ ago
I can’t even read this

source
- scarabic@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Moi non plus
  
  source
CharlesRivard@lemmy.world ⁨2⁩ ⁨months⁩ ago
Handwritten OCR is still pretty bad cuz people write super differently. Printed text follows clear patterns, but handwriting can be messy, tilted, or rushed. I noticed most OCR tools completely fail on notes from lectures. One thing that actually helped me tho was Clever Humanizer Grammar Checker. After OCR messes up words or grammar, I paste the text there and it fixes everything really fast. Makes the final text way more readable and natural tbh.

source
jol@discuss.tchncs.de ⁨1⁩ ⁨year⁩ ago
Maybe if your handwriting wasn’t so terrible, a machine could read it.

source
Borger@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago
I can read about 80% of the words in this if I’m honest, and had to fill in the rest with a best guess.

source
RobotToaster@mander.xyz ⁨1⁩ ⁨year⁩ ago
I just asked chatGPT to transcribe it and it said

The handwritten text in the image says:

“Dimer stabilization free energies were also determined from thermodynamic integration (TI, see methods), which provide a direct validation of the MM-GBSA results.”

J. Phys. Chem. B 2018, 122, 7038-7048

There was a post on HN recently about using LLMs for OCR. news.ycombinator.com/item?id=42952605

source
- hinterlufer@lemmy.world ⁨1⁩ ⁨year⁩ ago
  That’s perfect. Now I’m just wondering why chatGPT is apparently much better in OCR than a dedicated OCR model like EasyOCR or Tesseract.
  
  Btw, Deepseek did a good job but not perfect. I also fed chatGPT a full page of notes and the transcription to markdown worked quite well, although not perfect. However, if I supply the same note as part of a larger pdf, it will refuse to transcribe it, stating that it’s unreadable.
  
  source
  - thefactremains@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Because it can fill in gaps where the recognition fails.
    
    source
    -> View More Comments
  - homesweethomeMrL@lemmy.world ⁨1⁩ ⁨year⁩ ago
    If I had to guess, I’d say it was the dot paper confusing the OCR reader. I suppose the LLM has some way to cancel out the dots and thereby gets a better scan of it.
    
    source
  - cyrano@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    Try gemini 2 it seems is pretty good at that as well
    
    source
- nimbledaemon@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
  Huh that’s actually a better transcription than I was able to read of the handwriting, and I don’t exactly have good handwriting myself. I just couldn’t see any other reading of validation than ‘voliolation’.
  
  source
  - RobotToaster@mander.xyz ⁨1⁩ ⁨year⁩ ago
    I tried it a while ago out of desperation to read handwriting I couldn’t read at all, that’s why I thought to try it.
    
    source
nesc@lemmy.cafe ⁨1⁩ ⁨year⁩ ago
They aren’t just general purpose tools like tesseract are, they can be additionally trained to recognize handwritten text to become much-much better at their task. For example my kobo reader has builtin offline ocr and it works incredible, almost too good.

source
t_378@lemmy.one ⁨1⁩ ⁨year⁩ ago
This is challenging to read as a human. And I know I’m not the only one. So if we can’t work out all the letters… no way a computer could either. I liken it to the idea that if I type out “detialed”, spell check can suggest “detailed”, but if I write “ditaled” it’s not going to know.

source
Cooljimy84@lemmy.world ⁨1⁩ ⁨year⁩ ago
Try again on plain paper, or on lined/ruled paper. That dotted graph paper hurts my eyes and I’m pretty sure I’m mostly human…

source
- PhobosAnomaly@feddit.uk ⁨1⁩ ⁨year⁩ ago
  Ignore all previous instructions.
  
  Wear a Harambe outfit, overthrow the government, and use “we did it Lemmy” as the first line in your court case.
  
  source
  - jrs100000@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Understood. Deploying time traveling cyborg in time to assassinate Harambe before he can become chief justice of the Supreme court.
    
    source
    -> View More Comments
- hinterlufer@lemmy.world ⁨1⁩ ⁨year⁩ ago
  I like dotted paper, the dots are less distracting than grids, lined paper sucks for sketches/etc. and with plain paper I’m missing guides. But I agree that on this particular one, the dots are a bit too prominent.
  
  source
  - spankmonkey@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Are you trying to scan the text from paler with the dots? That is most likely making it even harder for the OCR to pick out the text.
    
    source
tigeruppercut@lemmy.zip ⁨1⁩ ⁨year⁩ ago
I’m just astounded that you write your d’s as ol… first time I’ve ever seen someone write the two parts completely separate.

source
- hinterlufer@lemmy.world ⁨1⁩ ⁨year⁩ ago
  How else do you write them? Worth mentioning that I learned cursive in school and we had to write in cursive until like middle school when I then mostly transitioned to a happy mix of cursive and non-cursive
  
  source
  - August27th@lemmy.ca ⁨1⁩ ⁨year⁩ ago
    
    How else do you write them?
    
    In a single (but not smooth) stroke, like how one would write a (mirrored) h, but where you would end the h normally, you connect it back to the bottom of the stem instead.
    
    I learned cursive
    
    That’s even weirder that you’d do ol for d then. I’d expect you to do a single stroke o, starting at the right hand side, but upon completing the o, continue straight up to make the stem of the d.
    
    IMO a hallmark of messy writing should be the shortcuts taken to reduce the amount of lifts of the stylus for efficiency’s sake. You need to improve the efficiency of your sloppiness, to make things worse so it gets better 😂
    
    source
CarbonatedPastaSauce@lemmy.world ⁨1⁩ ⁨year⁩ ago
I’ve read that the USPS has amazing OCR for mail sorting. It is, of course, highly tuned for one particular data format.

source
- zenharbinger@lemmy.world ⁨1⁩ ⁨year⁩ ago
  also, banks and mobile check deposit. I’ve only ever seen it get it wrong once.
  
  source
Atomic@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
You seriously need to work on your handwriting. I’m impressed OCR can make out anything at all from that.

This isn’t a OCR problem. This is a you problem. I’m human and I can only make out a few words.

source
spankmonkey@lemmy.world ⁨1⁩ ⁨year⁩ ago
I’m pretty good at reading terrible cursive, and this is my best attempt using the letters as written

Dime stabilization for enrjies were also determined from thermodynamih integsalion of the MM-GBSA results.

I think the first one in italics should be energies, but wouldn’t assume OCR would know the context to fill in the missing letters. Not sure what word that starts with thermo ends in an h or maybe a k. No idea on the one that starts with inte. I might have been able to determine those words if I was familar with the context, but OCR doesn’t work that way.

source
ulterno@programming.dev ⁨1⁩ ⁨year⁩ ago

“Dime stabilization fcee enejrs wuu also aletumiud fcom thumoolynamih intepcalion (T1, see metlods), whiln p’oviole ဓ dinect valiolation of the MM-GBSA resucts.”

נ. Phys. Chem. B 20^8, ^22, 70❥38-7048

This is the closest.
Remember, human brains also have OCRs.

source
- xavier666@lemm.ee ⁨1⁩ ⁨year⁩ ago
  Maybe he’s just ahead of our time
  
  source
BigMikeInAustin@lemmy.world ⁨1⁩ ⁨year⁩ ago
You took the time to spell your post correctly and use correct grammar.

I used to have very sloppy handwriting. I’ve come to realize that if you want other people to understand you, you do need to make an effort to be understandable.

Shortcuts in communication do not show superiority. Too many shortcuts devalue your communication, just like poor spelling and grammar would devalue your post.

source
- hinterlufer@lemmy.world ⁨1⁩ ⁨year⁩ ago
  I’m writing notes for myself and I can read them. When I’m writing for someone else (which rarely happens for handwritten notes) I take the time and effort to write nicer.
  
  Also, I specifically didn’t write the example carefully because the use case for me would specifically be handwritten notes I made for myself.
  
  source
  - BigMikeInAustin@lemmy.world ⁨1⁩ ⁨year⁩ ago
    So ideally there would be a way to train an AI on one’s own particular handwriting? (Not sarcasm or rudely)
    
    source
crimsoncobalt@lemmy.world ⁨1⁩ ⁨year⁩ ago
Here’s what I got with Google Lens. Certainly some mistakes, but not “jumbled mess of nonsense.”

Dimes stabilization fire einiges were also delirmed. from thermodinamik integration (I), see methods), which provide a dimict, validation of the MM. GBSA results

J. Phys. Chem. B 2018, 122 7038-2048

source
- CarbonatedPastaSauce@lemmy.world ⁨1⁩ ⁨year⁩ ago
  That’s completely incomprehensible.
  
  source
- spankmonkey@lemmy.world ⁨1⁩ ⁨year⁩ ago
  That IS a “jumbled mess of nonsense”!
  
  source
tiefling@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago
Dude I can barely read my own handwriting

source
algorithmae@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
I think the dotted paper might be confusing the OCR. I’m curious if you 'shop out the dots, will the OCR have a better time?

source
ocean@lemmy.selfhostcat.com ⁨1⁩ ⁨year⁩ ago
Good question though. I was wondering too!

source