Fusion CNN and LSTM for Inertial-Based Gait Recognition

Ricky H. Putra
10 min readJun 22, 2020

User identification framework from inertial based sensor

This paper present user identification framework from inertial based sensor or device. The goal is to recognize a target user from their gait patterns or way of walking, using accelerometer and gyroscope signals provided by inertial based sensor like smartphone. This research contributes several innovations such as 1) Novel feature extraction based on Convolutional Neural /network (CNN) 2) Convolutional neural network (CNN) & Long short term memory (LSTM) neural network to classify walking subject. We exploit deep learning approach as universal feature extractors for gait recognition, which able to remember classification results from subsequent walking cycles to identify target user. Our experiment shows the superiority of the approach against other state-of-the-art techniques, leading to accuracy of 93% with fewer than five walking cycles. Approaches and several design choices are evaluated and compared, assessing their impact on user identification performance.

Index Terms Gait, Inertial, CNN, LSTM, Fusion


Gait analysis began in the early 1970s where scientists used video camera systems that could produce detail studies of individual with realistic cost and time constraint. There were widespread application of gait analysis for pathological conditions such as cerebral palsy, parkinson and neuromuscular disorders [1]. Gait analysis is an important tool in the evaluation of operative procedures [1,2], rehabilitation progress [3], or the assessment of the motor status in neurologically impaired patients [4,5]. There are various parameters that are of interest such as joint kinematics, spatio-temporal parameters, joint forces, pressure distributions, and muscle activities. Accurate and efficient detection of gait events is essential for the analysis of human gait. Determination of heel strike and toe off allows walking trials to be broken up into gait cycles consisting of a stance and swing phase. This allows comparison of joint angles, forces and moments across multiple strides and walking. Analysis of gait data will often examine gait variables with reference to one or more of these gait events or phases, such as knee flexion at heel strike or knee moment at fifty percent of stance phase. It is critical that these events are detected accurately and consistently throughout a trial [6]. This paper presents user specific gait identification involves gait events recognition based on kinematic parameters and find gait patterns specific to individual or identification.


Most of researchers today performed two ways of human gait identification, first, explicitly through physiological gait parameters e.g. cadence, step-length or second through detection of gait phases or kinematic parameters e.g. joint angle measurement [7], optical markers or inertial data. Other technique includes markerless gait capture using depth camera e.g. kinect, pressure management, kinetics and dynamic electromyography. Gait analysis involves measurable parameters which introduced and analyzed, and interpretation, where conclusions of the subject are drawn. In general, most of the gait identification approaches are based on computer vision, but there are also works which are based on inertial sensors [11].

There are several stages and techniques involved during gait analysis. Since data is taken from sensor, it is prone of noises. Most of researchers today perform denoising or removing noises in data before analysing and fitting to model. Some researchers using specific feature extraction technique such as PCA eigenvalues and some are using deep learning to find important features.

Below are summary of several techniques of each analysis task that we have explored during our research.

Combination of common techniques used

Besides techniques above, we also found that different inertial sensor setup could also impact the result. Some researchers are using 7 IMUs with retroreflective markers, and some are using 3 or 5 IMUs sensor only. Depending on the objective, different sensor setup will have different implication on model performance. As stated in [9], gait recognition accuracy of three different sensor positions are found best in left IMU position.

Other authors results

With these different techniques and setup, there are huge combinations of techniques and setup that we can try to achieve best performance model.

Both CNN and LSTM have shown improvements over Deep Neural Networks (DNN) across a wide variety of tasks. CNNs are good at reducing sequences variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space.


In this section, we describe the architecture of the proposed approach for gait subject identification. Our methodology consists of 6 essential steps: data sourcing, pre-processing, decompose data to specific time window, features engineering, training estimators and finally predicting the gait subject.

In following, we explain all of these components in detail.

  1. Data sourcing

This research is primarily based on dataset acquired from Anguita et al in their paper with title A Public Domain Dataset for Human Activity Recognition Using Smartphones, where they have carried out with a group of 30 volunteers within an age bracket of 19–48 years. Each person performed six different activities wearing a smartphone on their waist. Using its embedded accelerometer and gyroscope, it captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz.

2. Pre-processing

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters. The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

3. Decomposition

Since raw data from sensors are time series and covers long sequences, we need to break them into subsequences. Since we are using fixed-width sliding windows of 2.56 sec and 50% overlap, we have 128 time steps.

4. Features engineering

Referring to the data source Anguita et al, all features selected for this database came from the accelerometer and gyroscope 3-axial raw signals. The acceleration signal was then separated into body and gravity acceleration signals using another low pass Butterworth filter with a corner frequency of 0.3 Hz. Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals. Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm. Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing another set of frequency domain signals. In total we have 561 features derived from 6 raw features.

5. Training estimators

The models are trained with CNN, LSTM and both combination in which we want to evaluate and compare with other state-of-the-arts model

6. Predicting gait subject

HAR dataset contains 30 individuals, in which we apply one hot encoding. For each 128 time steps multiply by the features, we have all of sensor data as X variables to predict Y (subject ID)


The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data. During training, we subsequently split the training data 80% for training and 20% for validation.

First, we establish baseline numbers for CNN and LSTM as shown in below table. With LSTM having 100 hidden layers, drop out 0.5 and fully connected layers. CNN having 1 convolution 2D layer with 240 hidden layers, Max Pooling and Average Pooling is used followed by fully connected layers.

Our baseline model performance

Our models are basically trained with two types of neural networks: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). We experiment both model with various hyperparameters tuning, and network types and also combining both as one model e.g. CNN-LSTM. Below is the accuracy result.

Our final model performance

Result 1: Comparing Model 1–4, CNN-LSTM with 2 bi-directional layers does not really have significant improvement over CNN-LSTM with 1 bi-directional layer. Although, it seems to have slightly better result with 200 training epochs.

Result 2: Comparing Model 3,4 and 5,6. It seems that by wrapping bi-directional layers over CNN-LSTM network improves the performance by ~ 3% for 100 epochs, but not much for 200 epochs.

Result 3: Comparing Model 9 and others, It seems that CNN with 4 Conv2D layers were able to achieve highest accuracy. This means that CNN model alone were able to learn from the sequence patterns and use them for gait subject identification.

Result 4: Another observation was LSTM model alone has very low accuracy ~34% suggesting that LSTM alone might not be able to learn from the sequence patterns very well for gait subject identification.


In this experiment, we have explored various data pre-processing techniques especially to work on sensor data. We also apply CNN and CNN-LSTM to compare and evaluate their performance given the complexity of sensor data. With proper denoising technique and fine tuning CNN and CNN-LSTM networks we found that both model are having quite similar result with CNN slightly higher than CNN-LSTM. Interestingly, LSTM model alone did not have good result. With CNN, important features are proven can be detected and extracted to fit into LSTM model and resulting the good result.


[1] Wikipedia contributors. “Gait analysis.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 9 Feb. 2019. Web. 1 Mar. 2019.

[2]. Loske, S.; Nüesch, C.; Byrnes, K.S.; Fiebig, O.; Schären, S.; Mündermann, A.; Netzer, C. Decompression surgery improves gait quality in patients with symptomatic lumbar spinal stenosis. Spine J. 2018, 18, 2195–2204.

[3] Zomar, B.O.; Bryant, D.; Hunter, S.; Howard, J.L.; Vasarhelyi, E.M.; Lanting, B.A. A randomised trial comparing spatio-temporal gait parameters after total hip arthroplasty between the direct anterior and direct lateral surgical approaches. HIP Int. 2018, 28, 478–484.

[4] Steultjens, M.P.M.; Dekker, J.; van Baar, M.E.; Oostendorp, R.A.B.; Bijlsma, J.W.J. Range of joint motion and disability in patients with osteoarthritis of the knee or hip. Rheumatology 2000, 39, 955–961.

[5] Bertoli, M.; Cereatti, A.; Trojaniello, D.; Avanzino, L.; Pelosin, E.; Del Din, S.; Rochester, L.; Ginis, P.; Bekkers, E.M.J.; Mirelman, A.; et al. Estimation of spatio-temporal parameters of gait from magneto-inertial measurement units: Multicenter validation among Parkinson, mildly cognitively impaired and healthy older adults. Biomed. Eng. OnLine 2018, 17, 58.

[6] Pau, M.; Corona, F.; Pili, R.; Casula, C.; Guicciardi, M.; Cossu, G.; Murgia, M. Quantitative assessment of gait parameters in people with Parkinson’s disease in laboratory and clinical setting: Are the measures interchangeable? Neurol. Int. 2018, 10, 7729.

[7] Zeni Jr, J. A., J. G. Richards, and J. S. Higginson. “Two simple methods for determining gait events during treadmill and overground walking using kinematic data.” Gait & posture 27.4 (2008): 710–714.

[8] Sprager, Sebastijan, and Matjaz Juric. “Inertial sensor-based gait recognition: a review.” Sensors 15.9 (2015): 22089–22127.

[9] Teufl, Wolfgang, et al. “Towards Inertial Sensor Based Mobile Gait Analysis: Event-Detection and Spatio-Temporal Parameters.” Sensors 19.1 (2019): 38.

[10] Delgado-Escaño, Rubén, et al. “An End-to-End Multi-Task and Fusion CNN for Inertial-Based Gait Recognition.” IEEE Access 7 (2019): 1897–1908.

[11] Gadaleta, Matteo, and Michele Rossi. “Idnet: Smartphone-based gait recognition with convolutional neural networks.” Pattern Recognition 74 (2018): 25–37.

[12] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24–26 April 2013.

[13] Riaz, Qaiser, et al. “Move Your Body: Age Estimation Based on Chest Movement During Normal Walk.” IEEE Access 7 (2019): 28510–28524.

[14] Sainath, Tara N., et al. “Convolutional, long short-term memory, fully connected deep neural networks.” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.

[15] Ngo, Thanh Trung, et al. “The largest inertial sensor-based gait database and performance evaluation of gait-based personal authentication.” Pattern Recognition 47.1 (2014): 228–237.

[16] Trung, Ngo Thanh, et al. “Performance evaluation of gait recognition using the largest inertial sensor-based gait database.” 2012 5th IAPR International Conference on Biometrics (ICB). IEEE, 2012.

[17] Sun, Bing, Yang Wang, and Jacob Banda. “Gait characteristic analysis and identification based on the iPhone’s accelerometer and gyrometer.” Sensors 14.9 (2014): 17037–17054.

[18] Hung, Tran, and Young Suh. “Inertial sensor-based two feet motion tracking for gait analysis.” Sensors 13.5 (2013): 5614–5629.

[19] Annadhorai, Anuradha, et al. “Human identification by gait analysis.” Proceedings of the 2nd International Workshop on Systems and Networking Support for Health Care and Assisted Living Environments. ACM, 2008.

[20] Dehzangi, Omid, Mojtaba Taherisadr, and Raghvendar ChangalVala. “IMU-based gait recognition using convolutional neural networks and multi-sensor fusion.” Sensors 17.12 (2017): 2735.

[21] Caldas, Rafael, et al. “A systematic review of gait analysis methods based on inertial sensors and adaptive algorithms.” Gait & posture 57 (2017): 204–210.

[22] Zou, Qin, et al. “Robust gait recognition by integrating inertial and RGBD sensors.” IEEE transactions on cybernetics 48.4 (2018): 1136–1150.

[23] Mannini, Andrea, et al. “A machine learning framework for gait classification using inertial sensors: Application to elderly, post-stroke and huntington’s disease patients.” Sensors 16.1 (2016): 134.



Ricky H. Putra

Leading digitization initiatives in AwanTunai focusing on strengthening Indonesia MSME businesses with technology. Software Dev | Automation | Data Science | AI