CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

Evalution of semantic meaningful of fuzzy collocation by using the generalized vector-space model of text collection

Read Polyakov D.V., Popov A.I., Matveeva A.S., Karasjov P.I., Baljukov D.A. Evalution of semantic meaningful of fuzzy collocation by using the generalized vector-space model of text collection // Caspian journal : management and high technologies. — 2016. — №1. — pp. 10-25.

Polyakov D.V. - Ph.D. (Engineering), Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, dimadress@yandex.ru

Popov A.I. - Ph.D. (Pedagogical), Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, olimp_popov@mail.ru

Matveeva A.S. - post-graduate student, Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, klenchic@mail.ru

Karasjov P.I. - post-graduate student, Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, karasevpav@rambler.ru

Baljukov D.A. - post-graduate student, Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, logan.tambov@gmail.com

In article are considered the generalized vector-space model of text collection; the mathematical apparatus of the comparison of semantic characteristics of an arbitrary group of factors, that are formalized in the form of fuzzy sets and terms. This mathematical apparatus allows defining the semantic significance for clustering text collection or information retrieval of the chosen groups of factors in comparison with the terms. Staging of the experiment and the architecture of software allows it is described. In article is introduced the concept of fuzzy collocation. The methods of constructing fuzzy collocations based on linguistic variables and fuzzification of distances between terms are offered. The results of experiment for factors that formalized fuzzy collocation are given. Consideration of fuzzy collocations is limited by two methods of constructing them: based on the linguistic variable and using the fuzzification of the distance between terms in texts. In addition, only the collocations, consisting of two terms are studied. Authors proved the independent nature of collocation and the effectiveness of their use for the clustering of text collections.

Key words: анализ текстов, нечёткая коллокация, факторный анализ, svd-разложение, лингвистическая переменная, теория нечётких множеств, архитектура программного обеспечения, векторно-пространственная модель, texts analysis, fuzzy collocation, factor analysis, svd-de