JA EN

OCR (Optical Character Recognition)

Technology that converts text within images or scanned documents into machine-readable digital text, enabling document digitization and automated data extraction.

OCR (Optical Character Recognition) detects text regions in images and converts them into machine-readable digital text. It powers applications from scanning documents into searchable PDFs, to extracting business card information, to translating street signs via smartphone cameras.

Traditional OCR relied on template matching and hand-crafted features, but deep learning has dramatically improved accuracy for handwritten text, multilingual documents, and scene text with perspective distortion.

Practical challenges include low resolution, uneven lighting, and complex layouts. Preprocessing such as binarization and deskewing improves accuracy. Multimodal LLMs are increasingly capable of document understanding, blurring the boundary between OCR and visual language comprehension.

Related Terms

Related Articles