If I had to guess, I’d say it was the dot paper confusing the OCR reader. I suppose the LLM has some way to cancel out the dots and thereby gets a better scan of it.
Comment on Why is OCR for handwritten content still that bad?
hinterlufer@lemmy.world 2 months agoThat’s perfect. Now I’m just wondering why chatGPT is apparently much better in OCR than a dedicated OCR model like EasyOCR or Tesseract.
Btw, Deepseek did a good job but not perfect. I also fed chatGPT a full page of notes and the transcription to markdown worked quite well, although not perfect. However, if I supply the same note as part of a larger pdf, it will refuse to transcribe it, stating that it’s unreadable.
homesweethomeMrL@lemmy.world 2 months ago
cyrano@lemmy.dbzer0.com 2 months ago
Try gemini 2 it seems is pretty good at that as well
thefactremains@lemmy.world 2 months ago
Because it can fill in gaps where the recognition fails.
executivechimp@discuss.tchncs.de 2 months ago
Which can be problematic. If it makes a mistake and isn’t obviously wrong, that could go unnoticed.
thefactremains@lemmy.world 2 months ago
100% agreed. But it doesn’t change the answer of why they are apparently better than OCR.
executivechimp@discuss.tchncs.de 2 months ago
Yep