CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

Optical character recognition errors correction method

Read Breyman Aleksandr D., Yakovlev Ilya A. Optical character recognition errors correction method // Caspian journal : management and high technologies. — 2014. — №1. — pp. 102-113.

Breyman Aleksandr D. - Ph.D. (Engineering),Associate Professor, National Research University “Higher School of Economics”, 20 Myasnitskaya St., Moscow 101000, Russian Federation, abreyman@hse.ru

Yakovlev Ilya A. - post-graduate student, Moscow State University of Instrument Engineering and Computer Science, 20 Stromynka St., Moscow, 107996, Russian Federation, krofes@gmail.com

Optical recognition of text documents is inevitably error-prone process. To identify and correct that errors systems use post-processing techniques that are usually based on dictionary search. Using dictionaries can bring an acceptable quality of recognition for Latin, Cyrillic and other phonetic alphabets, but of little use for the languages in which the selection of individual words is untypical or optional (Chinese, Japanese , Korean, Vietnamese and other languages). This paper discusses known methods to address this problem, and proposes a new approach to correcting certain types of errors, based on the application of neural networks ensembles (containing distinct neural network for each possible character), which allows to reduce the number of hieroglyphic recognition errors and to reduce dependence on the quality of dictionaries while recognizing texts in phonetic alphabets.

Key words: optical character recognition, character recognition errors, post-processing of recognition errors, the verification system for recognition results, dictionary-less recognition error correction system, hieroglyph recognition, neural networks, neural netwo