CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

Document representation as a key phrases vector for patents prior-art retrieval

Read Dykov Mikhail A., Kravets Alla G., Korobkin Dmitriy M., Ukustov Sergey Sergeyevich, Strelkov Oleg I. Document representation as a key phrases vector for patents prior-art retrieval // Caspian journal : management and high technologies. — 2014. — №1. — pp. 148-155.

Dykov Mikhail A. - post-graduate student, Volgograd State Technical University, 28 Lenin av., Volgograd, 400005, Russian Federation, dmawork@mail.ru

Kravets Alla G. - D.Sc. (Engineering),Professor, Volgograd State Technical University, 28 Lenin av., Volgograd, 400005, Russian Federation, agk@gde.ru

Korobkin Dmitriy M. - Ph.D. (Technical),Associate Professor, Volgograd State Technical University, 28 Lenin av., Volgograd, 400005, Russian Federation, dkorobkin80@mail.ru

Ukustov Sergey Sergeyevich - post-graduate student, Volgograd State Technical University, 28 Lenin av., Volgograd, 400005, Russian Federation, sergey@ukstv.me

Strelkov Oleg I. - Director, Federal Institute of Industrial Property, 1 Building, 30 Berezhkovskaya Naberezhnaya, Moscow, 123995, Russian Federation, fips@rupto.ru

We proposed a method of patent document representation as a key phrases vector and a method of using these vectors for the patents prior-art retrieval task. These methods are developed to significantly decrease of the time that examiner has to spend during prior-art retrieval. Proposed method includes a solution of a step by step subtasks set: the patents’ documents preprocessing, the key phrases retrieval from texts of patents’ documents, the similarity calculation between vectors of patents’ documents key phrases. The one of the main advantages of proposed methods is their easy scalability for the complete patents set which includes millions of documents. Performed experiments showed that developed methods significantly outperform the baseline.

Key words: prior-art patent search, patent examination, morphological analysis, natural language processing, big data, patent examination, key phrases search, texts similarity calculation, estimation procedure