CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

SOFTWARE FOR SEARCHING FOR PROHIBITED TEXT CONTENT ON MACHINE MEDIA

Read Suslov Aleksandr V, Azhmukhamedov Iskandar M. SOFTWARE FOR SEARCHING FOR PROHIBITED TEXT CONTENT ON MACHINE MEDIA // Caspian journal : management and high technologies. — 2018. — №1. — pp. 185-196.

Suslov Aleksandr V - Student, Astrakhan State University, 20a Tatishchev St., Astrakhan, 414056, Russian Federation, alex.-suslov@mail.ru

Azhmukhamedov Iskandar M. - Doct. Sci. (Engineering), Associate Professor, Astrakhan State University, 20a Tatishchev St., Astrakhan, 414056, Russian Federation, aim_agtu@mail.ru

The implementation of legislative control of the information resources content exacerbated the problem of automatic detection and blocking of the prohibited content contained in them. It can be in files on internal hard disks of computers and servers; on external storage media (external hard drives, flash drives, laser disks), as well as in cloud storage. The authors compared existing software to analyze the content of files. The following shortcomings were revealed: common software identifies the presence of certain content only in text files, but not in files with graphics; the given software provides not sufficient limitations to the range of scan. Therefore, an algorithm and software that implements it to identify prohibited content in text and image files was proposed. For the software offered by the authors and its already existing analogues, a comparison of the search results on the test material (the total number of files sized approximately 20 GB) was made. The search was performed on a set of given word combinations peculiar to a typical forbidden content. The results of comparison show the advantage of the proposed algorithm and software over existing ones in the following relations: in the speed of information processing; in the possibilities of detecting prohibited content in graphic files. In addition, the ratio of detected files with prohibited content is much higher in using the software proposed by the authors.

Key words: электронные информационные ресурсы, текстовые файлы, информационная безопасность, графические файлы, запрещенный контент, поиск контента, методики поиска, программное обеспечение, вычислительная эффективность, базы данных, вычислительные эксперименты, ele