A Survey on Urdu Handwritten Text Recognition: State of the Art, Challenges, and Future Directions

Authors

  • Tayaba Anjum Department of Computer Science, University of Management and Technology, Lahore, Pakistan
  • Arifah Azhar Department of Software Engineering, University of Management and Technology, Lahore, Pakistan

DOI:

https://doi.org/10.69591/jcai.3.1.2

Keywords:

Urdu Handwritten Text Recognition (HTR), Optical Character Recognition (OCR), Deep Learning, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformers.

Abstract

Urdu handwritten text recognition (HTR) has emerged as an important research area in computer vision and natural language processing, with applications in digital archiving, historical manuscript preservation, automated document processing, and assistive technologies. Despite significant progress in machine learning and deep learning, Urdu HTR continues to pose challenges due to the script’s cursive writing style, complex ligatures, diverse character shapes, and the frequent use of diacritics. This paper presents a comprehensive survey of Urdu HTR, covering traditional approaches such as rule-based and machine learning methods, as well as state-of-the-art deep learning architectures. We review publicly available datasets, examine major challenges including handwriting variability and the scarcity of large, annotated resources, and discuss recent trends such as transformer-based models, self-supervised learning, and multimodal recognition frameworks. Finally, we outline promising research directions aimed at advancing Urdu HTR, with an emphasis on developing large-scale datasets, enhancing model robustness and generalization, and enabling deployment in real-world applications.

References

[1] R. Plamondon and S. N. Srihari, “On-line and off-line handwriting recognition: a comprehensive survey,” IEEE TPAMI, vol. 22, no. 1, pp. 63–84, 2000.

[2] U. Pal, N. Sharma, T. Wakabayashi, and F. Kimura, “Handwritten character recognition of popular South Asian scripts: a survey,” ACM Trans. Asian Lang. Inf. Process., vol. 11, no. 1, pp. 1–35, 2012.

[3] A. Qureshi, S. Hussain, and A. Khan, “Benchmark dataset for cursive Urdu handwriting recognition,” Pattern Recognition Letters, vol. 136, pp. 155–162, 2020.

[4] S. S. Bukhari, F. Shafait, and T. M. Breuel, “Adaptive binarization of unconstrained handwritten documents,” ICDAR, pp. 61–65, 2009.

[5] A. Ul-Hasan, F. Shafait, and T. M. Breuel, “Offline printed Urdu Nastaliq text recognition with Bidirectional LSTM networks,” ICDAR, pp. 1061–1065, 2015.

[6] T. Anjum and N. Khan, “An attention based method for offline handwritten Urdu text recognition,” ICFHR, pp. 319–324, 2020.

[7] W. Ahmad, H. Ali, and A. Rauf, “Cursive Urdu handwritten dataset for OCR and recognition research,” ICDAR, pp. 1234–1240, 2019.

[8] M. Asad, S. Hussain, and A. Khan, “Urdu handwritten text image dataset (UHTID),” Data in Brief, vol. 31, p. 105915, 2020.

[9] S. Hussain, “Resources for Urdu language processing,” LREC Workshop on South Asian Languages, pp. 54–60, 2004.

[10] R. B. Arif and I. Siddiqi, “Rule-based approaches for Urdu handwritten text recognition,” NCET, pp. 1–6, 2015.

[11] M. Bukhari et al., “Feature extraction methods for South Asian scripts,” Pattern Analysis and Applications, vol. 16, pp. 643–656, 2013.

[12] A. Graves et al., “A novel connectionist system for unconstrained handwriting recognition,” IEEE TPAMI, vol. 31, no. 5, pp. 855–868, 2009.

[13] S. A. Sattar and I. Siddiqi, “Template matching for cursive handwritten Urdu text recognition,” NCET, pp. 1–6, 2015.

[14] T. Anjum and N. Khan, “CALText: Contextual Attention Localization for Offline Handwritten Text,” Neural Process. Lett., vol. 55, no. 6, pp. 7227–7257, 2023.

[15] H. Raza et al., “Handwritten Urdu character recognition using CNNs,” IEEE Access, vol. 8, pp. 173897–173907, 2020.

[16] Z. Ahmed et al., “Urdu handwritten recognition using CNN-LSTM hybrids,” J. Intell. Fuzzy Syst., vol. 40, no. 2, pp. 2781–2792, 2021.

[17] M. Li, Z. Zhang, and S. Bengio, “Vision transformers in document image analysis: A survey,” Pattern Recognition, vol. 134, p. 108970, 2023.

[18] A. Islam, T. Anjum, and N. Khan, “Line extraction in handwritten documents via instance segmentation,” IJDAR, vol. 26, no. 3, pp. 259–272, 2023.

[19] A. Mahmood et al., “Data augmentation using GANs for Urdu handwriting recognition,” IEEE Access, vol. 9, pp. 56321–56332, 2021.

[20] Y. Zhang, L. Wang, and X. Liu, “Deep learning methods for handwritten text recognition: a survey,” Pattern Recognition, vol. 124, p. 108475, 2022.

[21] A. Ul-Hasan et al., “Offline recognition challenges for Nastaliq script,” ICDAR, pp. 1042–1046, 2015.

[22] S. Alam et al., “Challenges and opportunities in South Asian HTR research,” IJCAI Workshop, pp. 12–19, 2021.

[23] S. Hussain and M. Afzal, “Code-switching challenges in Urdu handwriting recognition,” LREC Workshop on Multilingual NLP, pp. 102–108, 2018.

[24] H. Ali, A. Ullah, T. Iqbal, and S. Khattak, “Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network,” arXiv, Dec. 2019.

[25] M. Kashif, “Urdu Handwritten Text Recognition Using ResNet18,” arXiv, Feb. 2021.

[26] A. Rahman, A. Ghosh, and C. Arora, “UTRNet: High-Resolution Urdu Text Recognition in Printed Documents,” arXiv, Jun. 2023.

[27] A. Hamza, S. Ren, and U. Saeed, “ET-Network: A novel efficient transformer deep learning model for automated Urdu handwritten text recognition,” PLoS ONE, vol. 19, no. 5, May 2024.

[28] S. B. Ahmed, S. Naz, S. Swati, and M. I. Razzak, “Handwritten Urdu character recognition using one-dimensional BLSTM classifier,” Neural Computing and Applications, vol. 31, no. 4, pp. 1143–1151, Apr. 2019.

[29] Z. S. Noor ul, N. F. Muhammad, R. S. M. Kumail, K. M. Mubasher, A. U. Adnan, and S. Faisal, “A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition,” Neural Computing and Applications, vol. 34, no. 2, pp. 1635–1648, Jan. 2022.

[30] A. Raza, I. Siddiqi, A. Abidi, and F. Arif, “Title of the paper,” in Proc. Int. Conf. Frontiers in Handwriting Recognition (ICFHR), 2012, pp. 491–496.

Additional Files

Published

2025-06-17

Issue

Section

Articles