Real-time violent action recognition using key frames extraction and deep learning
Abstract
Violence recognition is crucial because of its applications in activities
related to security and law enforcement. Existing semi-automated systems have
issues such as tedious manual surveillances, which causes human errors and
makes these systems less effective. Several approaches have been proposed
using trajectory-based, non-object-centric, and deep-learning-based methods.
Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their
performance must be improved. This study explores the state-of-the-art deep
learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the proposed
framework, the keyframe extraction technique eliminates duplicate consecutive frames. This keyframing phase reduces the training data size and hence
decreases the computational cost by avoiding duplicate frames. For feature
selection and classification tasks, the applied sequential CNN uses one kernel
size, whereas the inception v4 CNN uses multiple kernels for different layers of
the architecture. For empirical analysis, four widely used standard datasets are
used with diverse activities. The results confirm that the proposed approach
attains 98% accuracy, reduces the computational cost, and outperforms the
existing techniques of violence detection and recognition.
Author
Ahmed, Muzamil
Ramzan, Muhammad
Khan, Hikmat Ullah
Iqbal, Saqib
Khan, Muhammad Attique
Choi, Jung-In
Nam, Yunyoung
Kadry, Seifedine