Created
September 23, 2019 11:51
-
-
Save jarodsmk/4d3c0f19fba9c386cfec292513e946b4 to your computer and use it in GitHub Desktop.
Correct text-image orientation with Python/Tesseract/OpenCV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import cv2 | |
import pytesseract | |
import urllib | |
import numpy as np | |
import re | |
# Installs: https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/ | |
if __name__ == '__main__': | |
# Uncomment the line below to provide path to tesseract manually | |
# pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract' | |
# Read image from URL | |
# Taken from https://stackoverflow.com/questions/21061814/how-can-i-read-an-image-from-an-internet-url-in-python-cv2-scikit-image-and-mah | |
# https://i.ibb.co/4mm9WvZ/book-rot.jpg | |
# https://i.ibb.co/M7jwWR2/book.jpg | |
# https://i.ibb.co/27bKNJ8/book-rot2.jpg | |
resp = urllib.request.urlopen('https://i.ibb.co/27bKNJ8/book-rot2.jpg') | |
image = np.asarray(bytearray(resp.read()), dtype="uint8") | |
image = cv2.imdecode(image, cv2.IMREAD_COLOR) # Initially decode as color | |
# TAKEN FROM: https://www.pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/ | |
# convert the image to grayscale and flip the foreground | |
# and background to ensure foreground is now "white" and | |
# the background is "black" | |
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) | |
gray = cv2.bitwise_not(gray) | |
rot_data = pytesseract.image_to_osd(image); | |
print("[OSD] "+rot_data) | |
rot = re.search('(?<=Rotate: )\d+', rot_data).group(0) | |
angle = float(rot) | |
if angle > 0: | |
angle = 360 - angle | |
print("[ANGLE] "+str(angle)) | |
# rotate the image to deskew it | |
(h, w) = image.shape[:2] | |
center = (w // 2, h // 2) | |
M = cv2.getRotationMatrix2D(center, angle, 1.0) | |
rotated = cv2.warpAffine(image, M, (w, h), | |
flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) | |
# TODO: Rotated image can be saved here | |
print(pytesseract.image_to_osd(rotated)); | |
print("[TEXT]") | |
# Run tesseract OCR on image | |
text = pytesseract.image_to_string(rotated, lang='eng', config="-psm 1") | |
# Print recognized text | |
print(text.encode(encoding='UTF-8')) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The code is very helpful. However I found one change that could really improve it.
Currently you are finding an angle and deciding whether the image should be rotated clockwise or anti clockwise. Then you are using cv2 functions to get the rotated matrix and warp affine. The problem is that OpenCV does not automatically allocate space for our entire rotated image to fit into the frame. As a result, if your image is of rectangular shape and rotated +90 or -90 then when you correct its orientation, half the image will be missing.
To avoid that you can use rotation using bound awareness. I am using the imutils function from https://www.pyimagesearch.com/2021/01/20/opencv-rotate-image/