CASPIAN JOURNAL
MANAGEMENT AND HIGH TECHNOLOGIES
Models and methods of efficient diagnostic feature ranking via Laplace approximatio
Read | Kolesin I.D., Troyanozhko O.A. Models and methods of efficient diagnostic feature ranking via Laplace approximatio // Caspian journal : management and high technologies. — 2017. — №3. — pp. 95-109. |
Kolesin I.D. - Doct. Sci. (Physics and Mathematics), Professor, Saint Petersburg State University, 35 Universitetskiy Ave., Saint Petersburg, Peterhof, 198504, Russian Federation, kolesin_id@mail.ru
Troyanozhko O.A. - postgraduate, Saint Petersburg State University, 35 Universitetskiy Ave., Saint Petersburg, Peterhof, 198504, Russian Federation, med_otpor@mail.ru
An economical method of diagnostic features (DF) ranking for classification of tumors in two groups (“benign” or “malignant”) based on data from the University of Wisconsin is presented. Two stages of method are implemented: first, the most informative DF from all available are identified; then - the classification is carried out. At the first stage selection and assessment of DFinformativeness degree is made using the overlapping coefficient (OVL). OVL is used to measure the similarity between two distribution functions or two samples represented by these distributions. At the same time, the smaller overlapping of the distribution density functions for different types of objects is, the more informative is the DF. While approximating, we used the classical Laplace distribution. The presence of a simple analytic form of the antiderivative for it is an advantage over other distributions that do not have a simple primitive in explicit form. As a result of ranking, we havegot a list of indicators ordered by decreasing degree of informativeness based on the corresponding OVL values. In practice, it is more important not only to economize the algorithm, but also to develop software - especially for large-scale research. Based on this, the comparison took into account the time complexity and profitability of the ranking algorithm. We should note, that increasing ECM computing capacity can reduce time spent on calculations, but does not improve the quality of object recognition. At the second, final stage, an algorithm based on the use of a discrete error function was used. Comparative analysis confirmed that the accuracy of diagnosis of the type of tumors was notworse than the accuracy obtained by other methods, but ranking algorithm was less complex. As a result, we managed to increase accuracy of diagnosis for the three DF found by this ranking method to 96,31 %. The proposed method of ranking can be used in practice as one of the auxiliary ones for early rapid diagnosis in mass surveys.
Key words: высокотехнологичная диагностика, рак молочной железы, вычислительные алгоритмы, диагностика, информатизация, качество медицинских услуг, коэффициент перекрытия, медицинские информационные системы, ранжирование показателей, распределение Лапласа, сложность