Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tried with a few historical handwritten German documents, accuracy was abysmal.


Semi-OT (similar language): The national archives in Sweden and Finland published a model for OCR:ing handwritten Swedish text from the 1600s to the 1800s with what to me seems like a very level of accuracy given the source material. (4% character error rate)

https://readcoop.eu/model/the-swedish-lion-i/

https://www.transkribus.org/success-story/creating-the-swedi...

https://huggingface.co/Riksarkivet

They have also published a fairly large volume of OCR:ed texts (IIRC birth/death notices from church records) using this model online. As a beginner genealogist it's been fun to follow.


HTR ( Handwritten Text Recognition ) is a completely different space than OCR. What were you expecting exactly?


It fits the "use cases" mentioned in the article

> Preserving historical and cultural heritage: Organizations and nonprofits that are custodians of heritage have been using Mistral OCR to digitize historical documents and artifacts, ensuring their preservation and making them accessible to a broader audience.


There is a difference between historical document and "my doctor prescription".

Someone coming here and saying it does not work with my old german hanwriting doesn't say much.


You're making a strawman, the parent specifically mentioned "historical handwritten documents"


For this task, general models will always perform poorly. My company trains custom gen ai models for document understanding. We recently trained a VLM for the German government to recognize documents written in old German handwriting, and it performed with exceptionally high accuracy.


Probably they are overfitting the benchmarks, since other users also complain of the low accuracy


Also working with historical handwritten German documents. So far Gemini seems to be the least wrong of the ones I've tried - any recommendations?


my recommendation is to train a custom model


Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) are different tasks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: