CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

Method of formalization of fuzzy collocations in texts based on linguistic variables

Read Polyakov D.V., Mitrofanov N.M., Matveeva A.S. Method of formalization of fuzzy collocations in texts based on linguistic variables // Caspian journal : management and high technologies. — 2015. — №4. — pp. 167-183.

Polyakov D.V. - Ph.D. (Engineering), senior lecturer, Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, dimadress@yandex.ru

Mitrofanov N.M. - undergraduate, assistant of department, Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, n.mitrofanow@gmail.com

Matveeva A.S. - post-graduate student, Tambov State Technical University, 106 Sovetskaya St., Tambov, 392000, Russian Federation, klenchic@mail.ru

The purpose of the article is the development of mathematical methods of formalizing the collocation in the texts. This can help to improve the quality of search and clustering text collections, through the introduction of collocations in the vector space model, considering the distance between terms. In the research are used theories of fuzzy sets, information retrieval and matrices. Researches, given in this article, are not answer at such questions as how to use this collocation for informational retrieval or text clustering, moreover all given researches are limited by a consideration of collocation as a pair of terms. Method of formalization of the collocation, which considering the distance between terms using the theory of fuzzy sets, is offered. This method consists in the formalization of the distance between terms by means of linguistic variable. Moreover, in the article enhanced vector space model of the text collection is offered, which give us a tool to conduct comparative analysis of using terms and fuzzy collocations for informational retrieval.

Key words: коллокация, текстовые коллекции, нечёткие коллокации, теория нечётких множеств, лингвистическая переменная, кластеризация текстовых коллекций, поиск в текстовых коллекциях, информационный поиск, collocation, text collection, fuzzy collocation, theory of f