Computer Language/Python

OCR with Tesseract

lejpower 2021. 11. 5. 17:49

테스트용 스탭서버가 우분투 이기 때문에 우분투를 기준으로 테스트 해 보았다.

tesseract install

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

 

pip install

pip install Pillow
pip install pytesseract

 

download the trained datafile

sudo apt-get install tesseract-ocr-*

https://github.com/tesseract-ocr/tessdata

 

GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

Trained models with support for legacy and LSTM OCR engine - GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

github.com

 

Python test code

테스트 사진이 일본항공권이라서 일본어로 테스트 해봄 ㅋ

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'


ocr = Image.open('/home/ubuntu/DEVEOPMENT_AWS_STEP_SERVER/Python_AWS_STEPSERVER/python_OCR_test/test_ocr.jpg')
result = pytesseract.image_to_string(ocr, lang='jpn')

print(result)

 

결과

BOAHUING PASS

 

保誠栓査坦と搭来口で2次元バーコードをタッチして《ださい。

Please touch the barcode at security check and the boarding 92te

ムSTAH ALLIANCE MEABER ぷと

東京/:                      、 沖縄

TOKYO/HANEDム                                  Le         OKINAW和A

09 : 20 発           ” 11:55着

 

2053556 1096791

 

 

指乗口 / 指乗順       - 指乗締切時刻

GATE / GROUP             Boarding Closs Tims

58 /Group4     09:10       ままーー
)

(LSN: 8834
DAF      g/7 8:49 BP 2 PNR:NQ5CF          FARE: INTOW    BN: 338