Indian Sign Language Recognition through Hybrid ConvNet-LSTM Networks
Abstract
Dynamic hand gesture recognition is a challenging task of Human-Computer Interaction (HCI) and Computer Vision. The potential application areas of gesture recognition include sign language translation, video gaming, video surveillance, robotics, and gesture-controlled home appliances. In the proposed research, gesture recognition is applied to recognize sign language words from real-time videos. Classifying the actions from video sequences requires both spatial and temporal features. The proposed system handles the former by the Convolutional Neural Network (CNN), which is the core of several computer vision solutions and the latter by the Recurrent Neural Network (RNN), which is more efficient in handling the sequences of movements. Thus, the real-time Indian sign language (ISL) recognition system is developed using the hybrid CNN-RNN architecture. The system is trained with the proposed CasTalk-ISL dataset. The ultimate purpose of the presented research is to deploy a real-time sign language translator to break the hurdles present in the communication between hearing-impaired people and normal people. The developed system achieves 95.99% top-1 accuracy and 99.46% top-3 accuracy on the test dataset. The obtained results outperform the existing approaches using various deep models on different datasets.
Downloads
References
Shweta Dour, Real time recognition of Indian sign language, Ph.D. dissertation, Dept. Elect. Comm. Eng., Bhagwant Univ., Ajmer, Rajasthan, India, 2017. http://shodhganga.inflibnet.ac.in/handle/10603/153744.
Zhi-jie Liang, Sheng-bin Liao, Bing-zhang Hu, 3D Convolutional Neural Networks for dynamic sign language recognition, The Computer Journal, vol. 61, no. 11, Nov. 2018, 1724-1736. https://doi.org/10.1093/comjnl/bxy049.
P. V. V. Kishore, D. Anil Kumar, A. S. Chandra Sekhara Sastry, E. Kiran Kumar, Motionlets matching with adaptive kernels for 3-D Indian Sign Language Recognition, IEEE Sensors Journal, vol. 18, no. 8, Apr 2018, 3327-3337. https://doi.org/10.1109/JSEN.2018.2810449.
S. Masood, A. Srivastava, H.C. Thuwal, M. Ahmad, Real-time sign language gesture (word) recognition from video sequences using CNN and RNN, Intelligent Engineering Informatics, Advances in Intelligent Systems and Computing, vol. 695, Apr. 2018, 623-632. https://doi.org/10.1007/978-981-10-7566-7_63.
Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, Benjamin Schrauwen, Sign language recognition using Convolutional Neural Networks, European Conference on Computer Vision- ECCV 2014 Workshops, Lecture Notes in Computer Science, vol. 8925, Mar. 2015, 572-578. https://doi.org/10.1007/978-3-319-16178-5_40.
Marwa Elpeltagy, Moataz Abdelwahab, Mohamed E. Hussein, Amin Shoukry, Asmaa Shoala, Moustafa Galal, Multi-modality-based Arabic Sign Language recognition, IET Computer Vision, vol. 12, no. 7, Oct 2018, 1031-1039. https://doi.org/10.1049/iet-cvi.2017.0598.
Biplab Ketan Chakraborty, DebajitSarma, M.K. Bhuyan, Karl F MacDorman, Review of constraints on vision-based gesture recognition for human–computer interaction, IET Computer Vision, vol. 12, no. 1, Jan. 2018, 3-15. https://doi.org/10.1049/iet-cvi.2017.0052.
Sushmita Mitra, Tinku Acharya, Gesture recognition: A survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 3, Apr. 2007, 311-324. https://doi.org/10.1109/TSMCC.2007.893280.
Kouichi Murakami, Hitomi Taguchi, Gesture recognition using Recurrent Neural Networks, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Louisiana, USA, May. 1991, 237-242. https://doi.org/10.1145/108844.108900.
Kenneth Lai, Svetlana N. Yanushkevich, CNN+RNN depth and skeleton based dynamic hand gesture recognition, 24th International Conference on Pattern Recognition (ICPR), Beijing, China, Aug. 2018. https://doi.org/10.1109/ICPR.2018.8545718.
Juan C. Nunez, Raul Cabido, Juan J. Pantrigo, Antonio S. Montemayor, Jose F. Velez, Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition, Pattern Recognition, vol. 76, Apr. 2018, 80-94. https://doi.org/10.1016/j.patcog.2017.10.033.
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li and Weiping Li, Video-based Sign Language Recognition without Temporal Segmentation, AAAI Conference on Artificial Intelligence (AAAI), 2018. http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html.
Q. De Smedt, H. Wannous, J.P. Vandeborre, Skeleton-based dynamic hand gesture recognition, The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun. 2016, 1206–1214. https://doi.org/10.1109/cvprw.2016.153.
Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen, Isolated sign language recognition with Grassmann covariance matrices, ACM Transactions on Accessible Computing (TACCESS), vol. 8, no. 4, May 2016. http://dx.doi.org/10.1145/2897735.
Ronchetti, F., Quiroga, F., Estrebou, C.A., Lanzarini, L.C., Rosete, A, LSA64: an Argentinian sign language dataset, XXII Congreso Argentino de Ciencias de la Computación (CACIC), (2016). http://facundoq.github.io/unlp/lsa64/.
Zafar Ahmed Ansari, Gaurav Harit, Nearest neighbour classification of Indian sign language gestures using kinect camera, Sadhana, 41, Feb. 2016, 161 – 182. https://doi.org/10.1007/s12046-015-0405-3.
Escalera, S., Bar, X., Gonzlez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce, V., Escalante, H.J., Shotton, J., Guyon, Cha, Learn Looking at People Challenge 2014: Dataset and Results, Computer Vision - ECCV 2014 Workshops, Lecture Notes in Computer Science, vol 8925. https://doi.org/10.1007/978-3-319-16178-5_32.
S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, O. Lopes, I. Guyon, V. Athistos, H.J. Escalante, Multi-modal Gesture Recognition Challenge 2013: Dataset and Results, ICMI 2013. http://www.maia.ub.es/~sergio/linked/ps-p445-escaleraps.pdf.
Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, Stan Sclaroff, Do less and achieve more: Training CNNs for action recognition utilizing action images from the Web, Pattern Recognition, vol. 68, Aug 2017, 334-345. https://doi.org/10.1016/j.patcog.2017.01.027.
Julien Maitre, Clement Rendu, Kevin Bouchard, Bruno Bouchard, Sebastien Gaboury, Basic daily activity recognition with a data glove, The 10th International Conference on Ambient Systems, Networks and Technologies (ANT), vol. 151, May 2019, 108-115. https://doi.org/10.1016/j.procs.2019.04.018.
Wen-Ren Yang, Chau-Shing Wang, Chien-Pu Chen, Motion-pattern recognition system using a wavelet-neural network, IEEE Transactions on Consumer Electronics, vol. 65, no. 2, May 2019, 170-178. https://doi.org/10.1109/TCE.2019.2895050.
Danilo Avola, Marco Bernardi, Luigi Cinque, Gian Luca Foresti, Cristiano Massaroni, Exploiting Recurrent Neural Networks and leap motion controller for the recognition of sign language and semaphoric hand gestures, IEEE Transactions on Multimedia, vol. 21, no. 1, Jan. 2019, 234-245. https://doi.org/10.1109/TMM.2018.2856094.
Bo Li, Chao Zhang, Cheng Han, Baoxing Bai, Gesture Recognition Based on Kinect v2 and Leap Motion Data Fusion, International Journal of Pattern Recognition and Artificial Intelligence, vol. 33, no. 05, 1955005 (2019). https://doi.org/10.1142/S021800141955005X.
Shuiwang Ji, Wei Xu, Ming Yang, Member, Kai Yu, 3D Convolutional Neural Networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, Jan. 2013. https://doi.org/10.1109/TPAMI.2012.59.
Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh, Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition, Aug 2017. [ https://arxiv.org/abs/1708.07632v1].
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Jan. 2016. [ https://arxiv.org/abs/1506.01497v3].
Karen Simonyan, Andrew Zisserman, Very deep Convolutional Networks for large-scale image recognition, ICLR 2015, Apr. 2015. https://arxiv.org/abs/1409.1556v6.
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna, Rethinking the Inception architecture for computer vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Dec. 2016. https://doi.org/10.1109/CVPR.2016.308.
Ilias Papastratis, Kosmas Dimitropoulos, Dimitrios Konstantinidis, Petros Daras, Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space, IEEE Access, Vol. 8, May 2020, 91170-91180. https://doi.org/10.1109/ACCESS.2020.2993650.
Muneer Al-Hammadi, Ghulam Muhammad, Wadood Abdul, Mansour Alsulaiman, Mohamed A. Bencherif, Mohamed Amine Mekhtiche, Hand Gesture Recognition for Sign Language Using 3DCNN, IEEE Access, Vol. 8, May 2020, 79491-79509. https://doi.org/10.1109/ACCESS.2020.2990434.
Safa Ameur, Anouar Ben Khalifa, Med Salim Bouhlel, A novel hybrid bidirectional unidirectional LSTM network for dynamic hand gesture recognition with Leap Motion, Entertainment Computing, Vol 35, June 2020. https://doi.org/10.1016/j.entcom.2020.100373.
Elahe Rahimian, Soheil Zabihi, Seyed Farokh Atashzar, Amir Asif, Arash Mohammadi, Surface EMG-Based Hand Gesture Recognition via Hybrid and Dilated Deep Neural Network Architectures for Neurorobotic Prostheses, Journal of Medical Robotics Research, Vol. 5, No. 1, Mar. 2020. https://doi.org/10.1142/S2424905X20410019.
Sepp Hochreiter, Jurgen Schmidhuber, Long Short-Term Memory, Neural Computation, vol. 9, no. 8, Nov. 1997, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Zhigang Tu, Hongyan Li, Dejun Zhang, Justin Dauwels, Baoxin Li, Junsong Yuan, Action-stage emphasized spatiotemporal VLAD for video action recognition, IEEE Transactions on Image Processing, vol. 28, no. 6, Jun. 2019, 2799-2812. https://doi.org/10.1109/TIP.2018.2890749.
Caihua Liu, Jie Liu, Zhicheng He, YujiaZhai, Qinghua Hu, Yalou Huang, Convolutional neural random fields for action recognition, Pattern Recognition, vol. 59, Nov. 2016, 213-224. https://doi.org/10.1016/j.patcog.2016.03.019.
PichaoWang, Wanqing Li, Philip Ogunbona, Jun Wan, Sergio Escalera, RGB-D-based human motion recognition with deep learning: A survey, Computer Vision and Image Understanding, vol. 171, Jun. 2018, 118-139. https://doi.org/10.1016/j.cviu.2018.04.007.
Santanu Pattanayak, Foundations of artificial intelligence-based systems, in Intelligent Projects using Python: 9 real-world AI projects leveraging machine learning and deep learning with TensorFlow and Keras, Packt Publishing, Birmingham, UK, 2019, Ch. 1. Sec. 13-14. [ https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788996921/1.
Festus Osayamwen, Jules-Raymond Tapamo, Deep learning class discrimination based on prior probability for human activity recognition, IEEE Access, vol. 7, Feb. 2019, 14747 - 14756. https://doi.org/10.1109/ACCESS.2019.2892118.
Di Wu, Lionel Pigou, Pieter-Jan Kindermans, Nam Do-Hoang Le, Ling Shao, Joni Dambre, Jean-Marc Odobez, Deep Dynamic Neural Networks for multimodal gesture segmentation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, Aug. 2016, 1583-1597. https://doi.org/10.1109/TPAMI.2016.2537340.
Runpeng Cui, Hu Liu, Changshui Zhang, A deep neural framework for continuous sign language recognition by iterative training, IEEE Transactions on Multimedia, vol. 21, no. 7, Jul. 2019, 1880-1891. https://doi.org/10.1109/TMM.2018.2889563.
Fangxin Wang, Wei Gong, Jiangchuan Liu, On spatial diversity in wiFi-based human activity recognition: A deep learning-based approach, IEEE Internet of Things Journal, vol. 6, no. 2, Apr. 2019, 2035-2047. https://doi.org/10.1109/JIOT.2018.2871445.
Yanqiu Liao, PengwenXiong, WeidongMin, Weiqiong Min, Jiahao Lu, Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks, IEEE Access, vol. 7, Mar. 2019, 38044 – 38054. https://doi.org/10.1109/ACCESS.2019.2904749.
Hao Tang, Hong Liu, Wei Xiao, Nicu Sebe, Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion, Neurocomputing, vol.331, Feb. 2019, 424-433. https://doi.org/10.1016/j.neucom.2018.11.038.
Google Colaboratory [https://colab.research.google.com/]
Copyright (c) 2021 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.
Funding data
-
IEEE Foundation
Grant numbers 2016-8