CASPIAN JOURNAL

MANAGEMENT AND HIGH TECHNOLOGIES

ANALYSIS OF METHODS FOR CLASSIFYING HUMAN ACTIONS ON A VIDEO IMAGE

Read Marienkov Alexander N., Prikhodko Alexander A. ANALYSIS OF METHODS FOR CLASSIFYING HUMAN ACTIONS ON A VIDEO IMAGE // Caspian journal : management and high technologies. — 2021. — №1. — pp. 46-53.

Marienkov Alexander N. - Astrakhan State University, 20Р° Tatishchev St., Astrakhan, 414056, Russian Federation

Prikhodko Alexander A. - Astrakhan State University, 20Р° Tatishchev St., Astrakhan, 414056, Russian Federation

The work justifies the relevance and practical significance of developing new methods for analyzing video images with the aim of classifying human actions for further identification of potentially dangerous incidents at the informatization facility. Classifiers based on model of neural network 3D ResNet, as well as approaches using vector model of body with application of library OpenPose are considered. The first experiment is made with use of model of neural network 3D ResNet. Dataset from Kinetic was used for training. That dataset is including about 400 actions, among which there were movements from martial arts. Examples from hockey fights and combat techniques from films were used in the testing set. The next experiment was to classify the action based on an analysis of a vector model of a human body. Kinect provides motion data in the form of a hierarchy of the main nodes of the human skeleton, where the rotation of some joints relative to others is represented in the form of quaternions. The final training of the model occurred using the RGBU-D dataset with 432 annotated actions. The BVH format was chosen to represent the formalized movement in the final experiment. Model retraining was carried out on the RGBU-D dataset, and therefore the description of all frames had to be changed from 20 key points of the standard OpenPose to 17 from the BVH standard, which were used in subsequent work with the model. The structure of the neural network with the LSTM layer with a change in input data was taken as the basis of the final module for classifying the actions available on the screen - instead of a set of frames, a set of vectors of people's bodies in the frame began to be supplied from the video. Training of this neural network was carried out using a dataset in 2000 video files (1000 dangerous situations [mainly fights] and 1000 ordinary actions in human life that are not a threat). The results were analyzed as well as conclusions were made about the applicability of the approaches considered for the task of recognizing the action of a person on a video image.

Key words: recognition, deep learning, neural networks, recognition and classification of human actions, incident detection, video image analysis