CASPIAN JOURNAL
MANAGEMENT AND HIGH TECHNOLOGIES
Mining drug-drug interactions from texts of scientific articles
Read | Kamaev Valeriy A., Melnikov Mikhail P., Vorobkalov Pavel N. Mining drug-drug interactions from texts of scientific articles // Caspian journal : management and high technologies. — 2015. — №1. — pp. 112-121. |
Kamaev Valeriy A. - D.Sc. (Engineering), Professor, Volgograd State Technical University, 28 Lenin Ave., Volgograd, 400005, Russian Federation, cad@vstu.ru
Melnikov Mikhail P. - post-graduate student, Volgograd State Technical University, 28 Lenin Ave., Volgograd, 400005, Russian Federation, m.p.melnikov@gmail.com
Vorobkalov Pavel N. - Ph.D. (Engineering), Associate Professor, Volgograd State Technical University, 28 Lenin Ave., Volgograd, 400005, Russian Federation, pavor84@gmail.com
Detection of drug-drug interactions (DDI) can cause serious consequences during treatment. A quick search of such interactions can provide doctors with information which is essential for making right decisions. Detection of DDIs is a time-consuming task. Natural language processing for text mining of scientific articles can be used to do DDI information more accessible for doctors. Nowadays there are some databases containing large amount of biomedical articles. Therefore computational performance of classification method applied for identification restricts usage of such methods. The main purpose of the research is to find a method of fast retrieval of DDI information from biomedical texts. In this article, we investigate up-do-date research works in the area of natural language processing for detection of DDIs. Many of investigated methods require much time to perform on large text corpuses. For developing and testing of DDI extraction methods we’ve created a text corpus containing examples of articles with and without DDI information. We propose a fast text mining approach to DDI articles classification using term frequency-inverse document frequency (tf-idf) statistic. Tf-idf is a numerical statistic that is intended to reflect how important a word is to a document in a corpus. To implement and test the classification algorithm, we’ve developed the text classification system. As a result, our approach is able to achieve reasonably high F1 score value (measure of binary classification) in DDI articles classification while still keeping short run-time. After all, we consider how to improve the developed algorithm for increase its precision and recall. When these improvements will be made the software realization of the algorithm may be used by experts in DDI area to search new DDI evidences in scientific publications.
Key words: information retrieval, drug-drug interaction, machine learning, информационные технологии, тексты на естественном языке, поиск информации, взаимодействие лекарственных средств, машинное обучение, автоматическая бинарная классификация, статистическая мера