Формалізація та первинна експериментальна перевірка адаптивного підходу до вибору OCR послідовності для розпізнавання тексту на зображеннях

Христина  Грицай; Оксана  Грицай; Ольга  Терендій

doi:10.15407/fmmit2026.42.158

Authors

Христина Грицай
Оксана Грицай
Ольга Терендій

DOI:

https://doi.org/10.15407/fmmit2026.42.158

Keywords:

оптичне розпізнавання символів, OCR, попередня обробка зображень, Tesseract, EasyOCR, PaddleOCR, RapidOCR, AmazonTextract, CER, WER, інтегральна оц інка

Abstract

The paper addresses the problem of selecting an appropriate text recognition
pipeline for images by considering image preprocessing methods and the specific
features of modern optical c haracter recognition (OCR) models. The relevance of the
study is determined by the fact that OCR quality depends not only on the selected
recognition model but also on the characteristics of the input image, including noise,
contrast, illumination, resolut ion, text skew, and background complexity.
The aim of the paper is to formalize an adaptive approach to OCR pipeline
selection and to perform its initial experimental evaluation. The proposed approach is
based on generating several preprocessed versions of the same input image, applying
OCR models to each version, obtaining recognized text, text region coordinates,
confidence scores, and processing time, and then evaluating the obtained results using
a multi criteria quality score. The study considers the following OCR tools: Tesseract,
EasyOCR, PaddleOCR, RapidOCR, and Amazon Textract. The preprocessing
configurations include the original image without preprocessing, grayscale
conversion, contrast enhancement, denoising with scaling, and Otsu binarization. The
quality assessment is based on Character Error Rate (CER), Word Error Rate (WER),
processing time, model confidence score, and fuzzy matching score. The experimental
part is considered as an initial experimental evaluation rather than a full scale
sta tistical comparison of OCR models. Its purpose is to verify the logic of the proposed
methodology, identify the main parameters that should be fixed in further experiments,
and prepare a basis for extended research on a larger dataset of images of different
quality. The obtained results demonstrate that the quality of OCR recognition may vary
depending on the selected combination of preprocessing method and OCR model.
However, the results should be interpreted as preliminary and cannot be considered a
final ranking of OCR models. The practical value of the proposed approach lies in its
potential use as a methodological basis for building OCR pipelines in automated
document processing systems, digital archives, electronic document management
systems, information retrieval systems, and applications for text recognition from
images.

References

Wang X. F., He Z. H., Wang K., Wang Y. F., Zou L., Wu Z. Z. A survey of text

detection and recognition algorithms based on deeplearning technology. Neurocomputing . 2023.

Vol. 556. Article 126702. DOI: 10.1016/j.neucom.2023.126702.

Smith R. An Over view of the Tesseract OCR Engine. Proceedings of the Ninth

International Conference on Document Analysis and Recognition (ICDAR 200 7). Curitiba,

Brazil, 2007. P. 629 633. DOI: 10.1109/ICDAR.2007.4376991

Cui L., Xu Y., Lv T., Wei F. Document AI: Benchmarks, Models and Applications.

A r x iv preprint. 2021. DOI: 10.48550/arXiv.2111.08609.URL: https://arxiv.org/abs/2111.08609

Appalaraju S., Jasani B., Kota B. U., Xie Y., Manmatha R. DocFormer: End to End

Transformer for Document Understanding. Proceedings of the IEE E/CVF International

Conference on Computer Vision (ICCV). 2021. P. 993 1003.

Baviskar D., Ahirrao S., Potdar V., Kotecha K. Efficient Automated Processing of the

Unstructured Documents Using Artificial Intelligence: A Systematic Literature Review and Future Directions. Directions. IEEE AccessIEEE Access. 2021. Vol. 9. P. 72894. 2021. Vol. 9. P. 72894––72936. DOI: 72936. DOI: 10.1109/ACCESS.2021.3072900.10.1109/ACCESS.2021.3072900.

Subramani N., Matton A., Greaves M., Lam A. A SurveySubramani N., Matton A., Greaves M., Lam A. A Survey ofof DeepDeep LearningLearning ApproachesApproaches for OCR andfor OCR and DocumentDocument Understanding. arXivpreprint. 2020. DOI: 10.48550/arXiv.20Understanding. arXivpreprint. 2020. DOI: 10.48550/arXiv.2011.13534.11.13534.

Long S., He X., Yao C. SceneLong S., He X., Yao C. Scene TextText DetectionDetection andand Recognition: TheDeepRecognition: TheDeep LearningLearning Era. Era. InternationalInternational JournalJournal ofof ComputerComputer VisionVision. 2021. Vol. 129. P. 161. 2021. Vol. 129. P. 161––184. DOI: 10.1007/s11263184. DOI: 10.1007/s11263--020020--0136901369--0.0.

Raisi Z., Naiel M. A., Fieguth P., Wardell S., Zelek J. Raisi Z., Naiel M. A., Fieguth P., Wardell S., Zelek J. TextDetectionandRecognitionintheWild: A Review. arXivpreprint. 2020. DOI: TextDetectionandRecognitionintheWild: A Review. arXivpreprint. 2020. DOI: 10.48550/arXiv.2006.04305.10.48550/arXiv.2006.04305.

Kim G., Hong T., Yim M., Nam J., Park J., Yim J., Hwang W., Yun S., Han D., Park S. Kim G., Hong T., Yim M., Nam J., Park J., Yim J., Hwang W., Yun S., Han D., Park S. OCROCR--FreeFree DocumentDocument UnderstandingUnderstanding Transformer. Transformer. ComputerComputer Vision Vision –– ECCV 202ECCV 2022. Lecture2. Lecture NotesinNotesin ComputerComputer ScienceScience. Cham : Springer, 2022. Vol. 13688. P. 498. Cham : Springer, 2022. Vol. 13688. P. 498––517. DOI: 10.1007/978517. DOI: 10.1007/978--33--031031--1981519815--1_29.1_29.

Kshetry R. L. ImagePreprocessingandModifiedAdaptiveThresholdingforImproving Kshetry R. L. ImagePreprocessingandModifiedAdaptiveThresholdingforImproving OCR. arXivpreprint. 2021. DOI: 10.48550/arXiv.2111.14075. URL: OCR. arXivpreprint. 2021. DOI: 10.48550/arXiv.2111.14075. URL: https://arxiv.org/abs/2111.14075https://arxiv.org/abs/2111.14075.

OOtsu N. A Thretsu N. A Thre sholdshold SelectionSelection MethodMethod fromfrom GrayGray--LevelLevel Histograms. Histograms. IEEE IEEE TransactionsTransactions onon Systems, Man, andSystems, Man, and CyberneticsCybernetics. 1979. Vol. 9, No. 1. P. 62. 1979. Vol. 9, No. 1. P. 62––66. DOI: 66. DOI: 10.1109/TSMC.1979.4310076.10.1109/TSMC.1979.4310076.

QualityQuality Assurancein OCRAssurancein OCR--D: EvaluationD: Evaluation Specification. OCRSpecification. OCR--D Documentation. 20D Documentation. 2022. 22. URL: URL: https://ocrhttps://ocr--d.de/en/spec/ocrd_eval.htmld.de/en/spec/ocrd_eval.html.

Recommendation ITURecommendation ITU--R BT.601R BT.601--7. Studioen7. Studioen codingcoding parametersparameters ofof digitaldigital televisiontelevision forfor standard 4:3 andwidestandard 4:3 andwide--screen 16:9 screen 16:9 aspectratios. Geneva : Internationalaspectratios. Geneva : International TelecommunicationTelecommunication Union, 2011. 20 p. URL: Union, 2011. 20 p. URL: https://www.itu.int/rec/Rhttps://www.itu.int/rec/R--RECREC--BT.601BT.601.

Gonzalez R. C., Woods R. E. Gonzalez R. C., Woods R. E. DigitalImageProcessing.DigitalImageProcessing. 4th ed. NewYork : Pearso4th ed. NewYork : Pearson, n, 2018. 1168 p.2018. 1168 p.

Tekalp A. M. DigitalTekalp A. M. Digital VideoVideo Processing. 2nd ed. Hoboken : PrenticeProcessing. 2nd ed. Hoboken : Prentice HallHall Press, 2015. Press, 2015. 624 p.624 p.

Zuiderveld K. ContrastZuiderveld K. Contrast LimitedLimited AdaptiveAdaptive HistogramHistogram Equalization. GraphicsEqualization. Graphics Gems IV / Gems IV / ed. by P. S. Heckbert. SanDiego : Academiced. by P. S. Heckbert. SanDiego : Academic Press, 1994. P. 474Press, 1994. P. 474––48485. DOI: 10.1016/B9785. DOI: 10.1016/B978--00--1212--336156336156--1.500611.50061--6.6.

EasyEasy OCR: ReadyOCR: Ready--toto--use OCR with 80+ supportedlanguages. use OCR with 80+ supportedlanguages. GitGit Hubrepository. Hubrepository. URL: URL: https://github.com/JaidedAI/EasyOCRhttps://github.com/JaidedAI/EasyOCR

PaddleOCR: Turnany PDF orimagedocumentintostructureddataforyour AI. PaddleOCR: Turnany PDF orimagedocumentintostructureddataforyour AI. GitHubrepository. URL: GitHubrepository. URL: https://github.com/PaddlePaddle/PaddleOCRhttps://github.com/PaddlePaddle/PaddleOCR.

RapidOCR: Opensource OCR toolformultiRapidOCR: Opensource OCR toolformulti--platformandplatformand ofof fflinedlined eployment. Giteployment. Git Hubrepository. URL: Hubrepository. URL: https://github.com/RapidAI/RapidOCRhttps://github.com/RapidAI/RapidOCR

AmazonAmazon TextractTextract DeveloperDeveloper Guide. AmazonGuide. Amazon WebWeb ServicesServices Documentation. URL: Documentation. URL: https://docs.aws.amazon.com/textract/latest/dg/whathttps://docs.aws.amazon.com/textract/latest/dg/what--is.htmlis.html

AnalyzeAnalyze Document Document —— AmazonAmazon Textract API Reference. AmazonTextract API Reference. Amazon WebWeb ServicesServices Documentation.URL: Documentation.URL: https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.htmlhttps://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html

Wagner R. A., Fischer M. J. TheWagner R. A., Fischer M. J. The StringString--toto--StringString CorrectionCorrection Problem. JournalProblem. Journal ofof the the ACM. 1974. VACM. 1974. Vol. 21, No. 1. P. 168ol. 21, No. 1. P. 168––173. DOI: 10.1145/321796.321811.173. DOI: 10.1145/321796.321811.

Hwang C.Hwang C.--L., Yoon K. MultipleL., Yoon K. Multiple AttributeAttribute DecisionDecision Making: MethodsMaking: Methods andand Applications. Applications. A StateA State--ofof--thethe--ArtSurvey. Berlin ; Heidelberg ; NewYork : SpringerArtSurvey. Berlin ; Heidelberg ; NewYork : Springer--Verlag, 1981. 259 p. DOI: Verlag, 1981. 259 p. DOI: 10.1007/97810.1007/978--33--642642--4831848318--99..

Formalization and Initial Experimental Evaluationof an Adaptive Approachto OCR Pipeline Selection for Text RecognitioninImages

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Language

Information