OCR Technology

Optical Character Recognition

Recognize and extract text information out of documents and images, convert information into editable content.

We use today’s advanced OCR technology to enhance the recognition accuracy and capabilities, and to help customers achieve their desired outcome. Our solutions capture information from characters, marks, barcodes and QR codes, covering over 100 languages and processing various types of documents and images. Customer’s manual works can be reduced to the minimum and productivity can be improved significantly.

About OCR

Optical Character Recognition (OCR) is a process by which text characters can be input to a computer by providing the computer with an image. The computer uses an OCR Engine-- a computer program with the specific function of making a guess which letter (recognizable to a computer) an image (recognizable to a human) represents.

Paperless includes an OCR Engine, which it uses to recognize text and numerical values. In order to understand how the OCR Engine in Paperless produces OCR results, it is useful also to understand how OCR Engines make these guesses.

The OCR Process

Here is a very-basic overview of how an OCR engine processes an image to return text contained in it:

  • A document or image is acquired by the computer
  • The document or image is submitted as input to an OCR engine
  • The OCR engine matches portions of the document or image to shapes it is instructed to recognize
  • Given logic parameters that the OCR engine has been instructed to use, the OCR engine will make its best guess as to which letter a shape represents
  • OCR results are returned as text