Converting Text in Images to Text with Azure OCR

Buse Köseoğlu
5 min readNov 1, 2023

--

In this article, I will tell you how to convert printed or handwritten text in the image into text with OCR (Optical Character Recognition) using the Computer Vision service, one of the Azure services.

Before you start these:

  • Azure subscription
  • Python 3.x version installed on your computer

it should.

Creation of Computer Vision Service

After completing these, we will go to the Azure portal and type “Computer Vision” in the search section in the “Create a Resource” section and create this resource. When you click Create, the screen that appears will be as follows. If you do not have a resource group here, you can create it by clicking “create new”. You can then create the service by filling in the other sections and clicking “review + create”.

After the creation of the service is completed, you can log in to the service by clicking the “Go to Resource” button on the page that opens. You will see the screen shown below.

You can manage the service you created using the menu on the left on this screen.

Getting Texts from Images

After completing the above steps, the process will continue from the IDE you use for Python on your computer. First, create a new .py file and run it in the console section.

- client library (pip install — upgrade azure-cognitiveservices-vision-computervision)

- Pillow library (pip install pillow)

you should download it.

You can then create a .py file and include the libraries to be used shown below.

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

from array import array
import os
from PIL import Image
import sys
import time

Since the Computer Vision service is a special service for you, you need to complete the authentication process in order to use this service. Your authentication information is located in Keys and Endpoint in the Resource Management section in the menu on the left side of the screen shown above.

The Keys here are specially created for you. You can see the created Keys by clicking Show Keys. Going back to the code, you can authenticate with the code below to use the service.

'''
Authenticate
Authenticates your credentials and creates a client.
'''
subscription_key = "PASTE_YOUR_COMPUTER_VISION_SUBSCRIPTION_KEY_HERE"
endpoint = "PASTE_YOUR_COMPUTER_VISION_ENDPOINT_HERE"

computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))

Here, you can perform the authentication process by typing “KEY1” or “KEY2” as seen on the screen above in the subscription_key section and “Endpoint” as seen above in the endpoint section.

Reading text from an image can be done in two ways. One of these is to give an image in your local area to the program, and the other is to give the URL of the image you want to use to the program.

  1. Getting Text from Local Image

You can create a folder named “images” in the folder you are using and put the image you want to use in this folder. The image below will be used as an example here.

images_folder = os.path.join(os.path.dirname(os.path.abspath(__file__)), "images")
# Get image path
read_image_path = os.path.join(images_folder, "deneme.png")
# Open the image
read_image = open(read_image_path, "rb")

# Call API with image and raw response (allows you to get the operation location)
read_response = computervision_client.read_in_stream(read_image, raw=True)
# Get the operation location (URL with ID as last appendage)
read_operation_location = read_response.headers["Operation-Location"]
# Take the ID off and use to get results
operation_id = read_operation_location.split("/")[-1]

# Call the "GET" API and wait for the retrieval of the results
while True:
read_result = computervision_client.get_read_result(operation_id)
if read_result.status.lower () not in ['notstarted', 'running']:
break
print ('Waiting for result...')
time.sleep(10)

# Print results, line by line
if read_result.status == OperationStatusCodes.succeeded:
for text_result in read_result.analyze_result.read_results:
for line in text_result.lines:
print(line.text)
#print(line.bounding_box)
print()

The above code outputs the text in the image in your locale. In the third line, “test.png” is the name of the image used in the example. When you give the name of your own image to this parameter, the program will work accordingly. When the work is completed, you can see the text in the picture in the printout.

If you open the commented print function on line 27 and try it, you can also see the bounding box coordinates of each line.

2. Getting the Text from the Image with the URL of the Image

In this section, we will see how to extract text from a handwritten image as an example. For this, the example in this link will be used.

print("===== Read File - remote =====")
# Get an image with text
read_image_url = "https://i.stack.imgur.com/g0f81.png"

# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(read_image_url, raw=True)
# </snippet_read_call>

# <snippet_read_response>
# Get the operation location (URL with an ID at the end) from the response
read_operation_location = read_response.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]

# Call the "GET" API and wait for it to retrieve the results
while True:
read_result = computervision_client.get_read_result(operation_id)
if read_result.status not in ['notStarted', 'running']:
break
time.sleep(1)

# Print the detected text, line by line
if read_result.status == OperationStatusCodes.succeeded:
for text_result in read_result.analyze_result.read_results:
for line in text_result.lines:
print(line.text)

print()

You can run the program by entering the URL of the image you want in the URL section on the 3rd line of the code above. As a result, we get the following output.

As you can see, OCR is also very successful in reading handwriting.

Thank you for reading.

--

--