CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

SELF-ORGANIZING CLUSTERING OF GREAT DATA FLOW

Read Pechenyy Eugene A., Nuriev Nail K., Starygina Svetlana D. SELF-ORGANIZING CLUSTERING OF GREAT DATA FLOW // Caspian journal : management and high technologies. — 2020. — №1. — pp. 10-20.

Pechenyy Eugene A. - Kazan National Research Technological University, platova51@mail.ru

Nuriev Nail K. - Kazan National Research Technological University, nurievnk@mail.ru

Starygina Svetlana D. - Kazan National Research Technological University, svetacd_kazan@mail.ru

This paper presents a mathematical model and provides a description of an algorithm, based on clustering apparatus focused on carrying out big data classification procedures. Spheroids are proposed to be used as clusters, for which to be constructed the variables are preliminarily normalized and transformed to be nondimensional. Simplicity of analytically defining the forms of clusters serves for the algorithm as an efficient protection from the curse of dimensionality and makes it efficient for a great number of criteria to be classified. The distinctive feature of the algorithm developed is its ability to function dynamically, i. e., in the conditions of changing properties of elements available in clusters and refilling the clusters with the stream of new elements. To ensure the unambiguity of the classification categories distinguished, the algorithm provides protection from cluster intersections. An important and useful operating characteristic of the algorithm is its selforganizability. It can process stream data without the intervention of an operator, correcting the locations and sizes of clusters, where necessary. Correcting procedure represents a sequence of iterations, in which the geometric centers of clusters approach to the centers of groups of objects available in the clusters. The paper presents a controlflow chart that was implemented in software. The algorithm operation is demonstrated and illustrated graphically, exemplified by a comparatively small data array, the elements of which are defined by two classification criteria.

Key words: большие данные, классификация объектов, кластеры, динамическая кластеризация, самоорганизация, самообучение, big data, object classification, clusters, dynamic clustering, selforganization, selfstudy