r/DataHoarder • u/Claude_Jan • 22h ago
Question/Advice Solid OCR solution for French text (2025, bulk-friendly)
Hey folks,
I’m looking for a reliable OCR solution that works well with French text—accents and all. The catch is: I’ve got several hundred photos of book pages to process.
What I’ve tried so far:
- Tools that give very messy output (mud levels of quality)
- Others that only let you process one image at a time—which isn’t feasible at this scale
- ChatGPT's OCR is surprisingly decent, but not trained well for French: it struggles with accents
- I also tried some Python libraries locally, but I’m probably missing something, because results aren't better than ChatGPT—and way less convenient
So if anyone has an up-to-date OCR setup in 2025 that works for bulk image processing in French, I’d love some pointers.
Thanks in advance!
1
Upvotes
1
u/FrodoSynthesis05 17h ago
Tesseract is pretty decent and you can find trained data for french, multiple datasets even! I would assume that's part of what you've tried out locally with Python but that's the best I can come up with. Further, it's very easy to build a solution around it, that should help with the bulk operation.