AIFFEL 아이펠 40일차

Notice

Recent Posts

Recent Comments

Link

« 2024/10 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

뮤트 개발일지

AIFFEL 아이펠 40일차 본문

AIFFEL

AIFFEL 아이펠 40일차

박뮤트 2022. 2. 24. 12:27

문자를 읽을 수 있는 딥러닝

OCR(Optical Character Recognition, 광학 문자 인식)

과정)

1. Text Detection(문자 검출): 입력받은 사진 속에서 문자의 위치를 찾는다.

- 방식1) Regression(회귀): 기준으로 하는 박스 대비 문자의 박스가 얼마나 차이나는지 학습한다.

- 방식2) Segmentation(세그멘테이션): 픽셀 단위로 해당 픽셀이 문자를 표현하는지를 분류하는 문제(pixel-wise classification)

2. Text Recognition(문자 인식): 찾은 문자 영역으로부터 문자를 읽어낸다.

- CRNN: 이미지 내의 문자 인식 모델의 기본적인 방법 중 하나. 이미지 내의 텍스트와 연관된 특징을 CNN을 통해 추출한 후, 스텝 단위의 문자 정보를 RNN으로 인식한다.

Ibrahim, Ahmed Sobhy Elnady. End-To-End Text Detection Using Deep Learning. Diss. Virginia Tech, 2017. (https://arxiv.org/abs/1507.05717)

https://tv.naver.com/v/4578167

글자읽는 AI: 밑바닥부터 외국어 정복까지

NAVER Engineering | 글자읽는 AI: 밑바닥부터 외국어 정복까지

tv.naver.com

https://www.youtube.com/watch?v=ckRFBl_XWFg

<keras-ocr 써보기>

import matplotlib.pyplot as plt
import keras_ocr

# keras-ocr이 detector과 recognizer를 위한 모델을 자동으로 다운로드받게 됩니다. 
pipeline = keras_ocr.pipeline.Pipeline()

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for idx, ax in enumerate(axs):
    keras_ocr.tools.drawAnnotations(image=images[idx], 
                                    predictions=prediction_groups[idx][0], ax=ax)

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for idx, ax in enumerate(axs):
    keras_ocr.tools.drawAnnotations(image=images[idx], 
                                    predictions=prediction_groups[idx][0], ax=ax)

<Tesseract OCR 써보기>

import os
import pytesseract
from PIL import Image
from pytesseract import Output
import matplotlib.pyplot as plt

# OCR Engine modes(–oem):
# 0 - Legacy engine only.
# 1 - Neural nets LSTM engine only.
# 2 - Legacy + LSTM engines.
# 3 - Default, based on what is available.

# Page segmentation modes(–psm):
# 0 - Orientation and script detection (OSD) only.
# 1 - Automatic page segmentation with OSD.
# 2 - Automatic page segmentation, but no OSD, or OCR.
# 3 - Fully automatic page segmentation, but no OSD. (Default)
# 4 - Assume a single column of text of variable sizes.
# 5 - Assume a single uniform block of vertically aligned text.
# 6 - Assume a single uniform block of text.
# 7 - Treat the image as a single text line.
# 8 - Treat the image as a single word.
# 9 - Treat the image as a single word in a circle.
# 10 - Treat the image as a single character.
# 11 - Sparse text. Find as much text as possible in no particular order.
# 12 - Sparse text with OSD.
# 13 - Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

def crop_word_regions(image_path='./images/sample.png', output_path='./output'):
    if not os.path.exists(output_path):
        os.mkdir(output_path)
    custom_oem_psm_config = r'--oem 3 --psm 3'
    image = Image.open(image_path)

    recognized_data = pytesseract.image_to_data(
        image, lang='eng',    # 한국어라면 lang='kor'
        config=custom_oem_psm_config,
        output_type=Output.DICT
    )
    
    top_level = max(recognized_data['level'])
    index = 0
    cropped_image_path_list = []
    for i in range(len(recognized_data['level'])):
        level = recognized_data['level'][i]
    
        if level == top_level:
            left = recognized_data['left'][i]
            top = recognized_data['top'][i]
            width = recognized_data['width'][i]
            height = recognized_data['height'][i]
            
            output_img_path = os.path.join(output_path, f"{str(index).zfill(4)}.png")
            print(output_img_path)
            cropped_image = image.crop((
                left,
                top,
                left+width,
                top+height
            ))
            cropped_image.save(output_img_path)
            cropped_image_path_list.append(output_img_path)
            index += 1
    return cropped_image_path_list


work_dir = os.getenv('HOME')+'/aiffel/ocr_python'
img_file_path = work_dir + '/test1.jpeg'   #테스트용 이미지 경로입니다. 본인이 선택한 파일명으로 바꿔주세요. 

cropped_image_path_list = crop_word_regions(img_file_path, work_dir)

def recognize_images(cropped_image_path_list):
    custom_oem_psm_config = r'--oem 3 --psm 7'
    
    for image_path in cropped_image_path_list:
        image = Image.open(image_path)
        recognized_data = pytesseract.image_to_string(
            image, lang='eng',    # 한국어라면 lang='kor'
            config=custom_oem_psm_config,
            output_type=Output.DICT
        )
        print(recognized_data['text'])
    print("Done")

# 위에서 준비한 문자 영역 파일들을 인식하여 얻어진 텍스트를 출력합니다.
recognize_images(cropped_image_path_list)

'AIFFEL' 카테고리의 다른 글

AIFFEL 아이펠 42일차 (0)	2022.03.14
AIFFEL 아이펠 41일차 (0)	2022.02.25
AIFFEL 아이펠 39일차 (0)	2022.02.23
AIFFEl 아이펠 38일차 (0)	2022.02.22
AIFFEL 아이펠 37일차 (0)	2022.02.21

'AIFFEL' Related Articles

뮤트 개발일지

AIFFEL 아이펠 40일차 본문

AIFFEL 아이펠 40일차

문자를 읽을 수 있는 딥러닝

'AIFFEL' 카테고리의 다른 글

티스토리툴바