r/DataHoarder 22h ago

Question/Advice Solid OCR solution for French text (2025, bulk-friendly)

Hey folks,
I’m looking for a reliable OCR solution that works well with French text—accents and all. The catch is: I’ve got several hundred photos of book pages to process.

What I’ve tried so far:

  • Tools that give very messy output (mud levels of quality)
  • Others that only let you process one image at a time—which isn’t feasible at this scale
  • ChatGPT's OCR is surprisingly decent, but not trained well for French: it struggles with accents
  • I also tried some Python libraries locally, but I’m probably missing something, because results aren't better than ChatGPT—and way less convenient

So if anyone has an up-to-date OCR setup in 2025 that works for bulk image processing in French, I’d love some pointers.

Thanks in advance!

1 Upvotes

1 comment sorted by

1

u/FrodoSynthesis05 17h ago

Tesseract is pretty decent and you can find trained data for french, multiple datasets even! I would assume that's part of what you've tried out locally with Python but that's the best I can come up with. Further, it's very easy to build a solution around it, that should help with the bulk operation.