Формалізація та первинна експериментальна перевірка адаптивного підходу до вибору OCR послідовності для розпізнавання тексту на зображеннях

Христина  Грицай; Оксана  Грицай; Ольга  Терендій

doi:10.15407/fmmit2026.42.158

Автор(и)

Христина Грицай
Оксана Грицай
Ольга Терендій

DOI:

https://doi.org/10.15407/fmmit2026.42.158

Ключові слова:

оптичне розпізнавання символів, OCR, попередня обробка зображень, Tesseract, EasyOCR, PaddleOCR, RapidOCR, AmazonTextract, CER, WER, інтегральна оц інка

Анотація

У статті розглянуто
задачу виб ору послідовності розпізнавання тексту на зображеннях із
урахуванням методів попередньої обробки та особливостей сучасних OCR моделей . На
прикладі тестового зображення проілюстровано, що різні методи попередньої обробки
можуть змінювати результат OCR розпі знавання У роботі запропоновано методику
адаптивн ого експериментальн ого підх о д у до вибору алгоритм у послідовності розпізнавання
тексту шляхом комбінування різних методів попередньої обробки зображень та сучасних
OCR моделей. У межах первинн ої експериментальн ої перевірк и використано OCR моделі :
Tesseract, EasyOCR, PaddleOCR, RapidOCR та AmazonTextract. Запропонований підхід
передбачає вибір конфігурації за схемою: тип зображення метод попередньої обробки
OCR модель оцінювання результатів вибір найкращої послідовності розпізнавання.
Оцінювання ефективності виконується за інтегральною оцінкою побудованою на основі
метрик CER, WER , час у обробки , показника впевненості моделі та оцінки нечіткого
зіставлення Отримані результати мають попередній характер і розглядаються як основа
для подальшого розширеного експериментального дослідження.

Посилання

Wang X. F., He Z. H., Wang K., Wang Y. F., Zou L., Wu Z. Z. A survey of text

detection and recognition algorithms based on deeplearning technology. Neurocomputing . 2023.

Vol. 556. Article 126702. DOI: 10.1016/j.neucom.2023.126702.

Smith R. An Over view of the Tesseract OCR Engine. Proceedings of the Ninth

International Conference on Document Analysis and Recognition (ICDAR 200 7). Curitiba,

Brazil, 2007. P. 629 633. DOI: 10.1109/ICDAR.2007.4376991

Cui L., Xu Y., Lv T., Wei F. Document AI: Benchmarks, Models and Applications.

A r x iv preprint. 2021. DOI: 10.48550/arXiv.2111.08609.URL: https://arxiv.org/abs/2111.08609

Appalaraju S., Jasani B., Kota B. U., Xie Y., Manmatha R. DocFormer: End to End

Transformer for Document Understanding. Proceedings of the IEE E/CVF International

Conference on Computer Vision (ICCV). 2021. P. 993 1003.

Baviskar D., Ahirrao S., Potdar V., Kotecha K. Efficient Automated Processing of the

Unstructured Documents Using Artificial Intelligence: A Systematic Literature Review and Future Directions. Directions. IEEE AccessIEEE Access. 2021. Vol. 9. P. 72894. 2021. Vol. 9. P. 72894––72936. DOI: 72936. DOI: 10.1109/ACCESS.2021.3072900.10.1109/ACCESS.2021.3072900.

Subramani N., Matton A., Greaves M., Lam A. A SurveySubramani N., Matton A., Greaves M., Lam A. A Survey ofof DeepDeep LearningLearning ApproachesApproaches for OCR andfor OCR and DocumentDocument Understanding. arXivpreprint. 2020. DOI: 10.48550/arXiv.20Understanding. arXivpreprint. 2020. DOI: 10.48550/arXiv.2011.13534.11.13534.

Long S., He X., Yao C. SceneLong S., He X., Yao C. Scene TextText DetectionDetection andand Recognition: TheDeepRecognition: TheDeep LearningLearning Era. Era. InternationalInternational JournalJournal ofof ComputerComputer VisionVision. 2021. Vol. 129. P. 161. 2021. Vol. 129. P. 161––184. DOI: 10.1007/s11263184. DOI: 10.1007/s11263--020020--0136901369--0.0.

Raisi Z., Naiel M. A., Fieguth P., Wardell S., Zelek J. Raisi Z., Naiel M. A., Fieguth P., Wardell S., Zelek J. TextDetectionandRecognitionintheWild: A Review. arXivpreprint. 2020. DOI: TextDetectionandRecognitionintheWild: A Review. arXivpreprint. 2020. DOI: 10.48550/arXiv.2006.04305.10.48550/arXiv.2006.04305.

Kim G., Hong T., Yim M., Nam J., Park J., Yim J., Hwang W., Yun S., Han D., Park S. Kim G., Hong T., Yim M., Nam J., Park J., Yim J., Hwang W., Yun S., Han D., Park S. OCROCR--FreeFree DocumentDocument UnderstandingUnderstanding Transformer. Transformer. ComputerComputer Vision Vision –– ECCV 202ECCV 2022. Lecture2. Lecture NotesinNotesin ComputerComputer ScienceScience. Cham : Springer, 2022. Vol. 13688. P. 498. Cham : Springer, 2022. Vol. 13688. P. 498––517. DOI: 10.1007/978517. DOI: 10.1007/978--33--031031--1981519815--1_29.1_29.

Kshetry R. L. ImagePreprocessingandModifiedAdaptiveThresholdingforImproving Kshetry R. L. ImagePreprocessingandModifiedAdaptiveThresholdingforImproving OCR. arXivpreprint. 2021. DOI: 10.48550/arXiv.2111.14075. URL: OCR. arXivpreprint. 2021. DOI: 10.48550/arXiv.2111.14075. URL: https://arxiv.org/abs/2111.14075https://arxiv.org/abs/2111.14075.

OOtsu N. A Thretsu N. A Thre sholdshold SelectionSelection MethodMethod fromfrom GrayGray--LevelLevel Histograms. Histograms. IEEE IEEE TransactionsTransactions onon Systems, Man, andSystems, Man, and CyberneticsCybernetics. 1979. Vol. 9, No. 1. P. 62. 1979. Vol. 9, No. 1. P. 62––66. DOI: 66. DOI: 10.1109/TSMC.1979.4310076.10.1109/TSMC.1979.4310076.

QualityQuality Assurancein OCRAssurancein OCR--D: EvaluationD: Evaluation Specification. OCRSpecification. OCR--D Documentation. 20D Documentation. 2022. 22. URL: URL: https://ocrhttps://ocr--d.de/en/spec/ocrd_eval.htmld.de/en/spec/ocrd_eval.html.

Recommendation ITURecommendation ITU--R BT.601R BT.601--7. Studioen7. Studioen codingcoding parametersparameters ofof digitaldigital televisiontelevision forfor standard 4:3 andwidestandard 4:3 andwide--screen 16:9 screen 16:9 aspectratios. Geneva : Internationalaspectratios. Geneva : International TelecommunicationTelecommunication Union, 2011. 20 p. URL: Union, 2011. 20 p. URL: https://www.itu.int/rec/Rhttps://www.itu.int/rec/R--RECREC--BT.601BT.601.

Gonzalez R. C., Woods R. E. Gonzalez R. C., Woods R. E. DigitalImageProcessing.DigitalImageProcessing. 4th ed. NewYork : Pearso4th ed. NewYork : Pearson, n, 2018. 1168 p.2018. 1168 p.

Tekalp A. M. DigitalTekalp A. M. Digital VideoVideo Processing. 2nd ed. Hoboken : PrenticeProcessing. 2nd ed. Hoboken : Prentice HallHall Press, 2015. Press, 2015. 624 p.624 p.

Zuiderveld K. ContrastZuiderveld K. Contrast LimitedLimited AdaptiveAdaptive HistogramHistogram Equalization. GraphicsEqualization. Graphics Gems IV / Gems IV / ed. by P. S. Heckbert. SanDiego : Academiced. by P. S. Heckbert. SanDiego : Academic Press, 1994. P. 474Press, 1994. P. 474––48485. DOI: 10.1016/B9785. DOI: 10.1016/B978--00--1212--336156336156--1.500611.50061--6.6.

EasyEasy OCR: ReadyOCR: Ready--toto--use OCR with 80+ supportedlanguages. use OCR with 80+ supportedlanguages. GitGit Hubrepository. Hubrepository. URL: URL: https://github.com/JaidedAI/EasyOCRhttps://github.com/JaidedAI/EasyOCR

PaddleOCR: Turnany PDF orimagedocumentintostructureddataforyour AI. PaddleOCR: Turnany PDF orimagedocumentintostructureddataforyour AI. GitHubrepository. URL: GitHubrepository. URL: https://github.com/PaddlePaddle/PaddleOCRhttps://github.com/PaddlePaddle/PaddleOCR.

RapidOCR: Opensource OCR toolformultiRapidOCR: Opensource OCR toolformulti--platformandplatformand ofof fflinedlined eployment. Giteployment. Git Hubrepository. URL: Hubrepository. URL: https://github.com/RapidAI/RapidOCRhttps://github.com/RapidAI/RapidOCR

AmazonAmazon TextractTextract DeveloperDeveloper Guide. AmazonGuide. Amazon WebWeb ServicesServices Documentation. URL: Documentation. URL: https://docs.aws.amazon.com/textract/latest/dg/whathttps://docs.aws.amazon.com/textract/latest/dg/what--is.htmlis.html

AnalyzeAnalyze Document Document —— AmazonAmazon Textract API Reference. AmazonTextract API Reference. Amazon WebWeb ServicesServices Documentation.URL: Documentation.URL: https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.htmlhttps://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html

Wagner R. A., Fischer M. J. TheWagner R. A., Fischer M. J. The StringString--toto--StringString CorrectionCorrection Problem. JournalProblem. Journal ofof the the ACM. 1974. VACM. 1974. Vol. 21, No. 1. P. 168ol. 21, No. 1. P. 168––173. DOI: 10.1145/321796.321811.173. DOI: 10.1145/321796.321811.

Hwang C.Hwang C.--L., Yoon K. MultipleL., Yoon K. Multiple AttributeAttribute DecisionDecision Making: MethodsMaking: Methods andand Applications. Applications. A StateA State--ofof--thethe--ArtSurvey. Berlin ; Heidelberg ; NewYork : SpringerArtSurvey. Berlin ; Heidelberg ; NewYork : Springer--Verlag, 1981. 259 p. DOI: Verlag, 1981. 259 p. DOI: 10.1007/97810.1007/978--33--642642--4831848318--99..

Формалізація та первинна експериментальна перевірка адаптивного підходу до вибору OCR послідовності для розпізнавання тексту на зображеннях

Автор(и)

DOI:

Ключові слова:

Анотація

Посилання

##submission.downloads##

Опубліковано

Як цитувати

Номер

Розділ

Ліцензія

##plugins.block.developedBy.blockTitle##

Мова

Інформація