Named Entity Recognition and transliteration in Bengali

Ekbal, Asif; Naskar, Sudip Kumar; Bandyopadhyay, Sivaji

doi:10.1075/li.30.1.07ekb

Article published In: Named Entities: Recognition, classification and use
Edited by Satoshi Sekine and Elisabete Ranchhod
[Lingvisticæ Investigationes 30:1] 2007
► pp. 95–114

Get fulltext from our e-platform

Download PDF

Named Entity Recognition and transliteration in Bengali

Asif Ekbal | Department of Computer Science and Engineering, Jadavpur University

Sudip Kumar Naskar

Sivaji Bandyopadhyay

Published online: 10 August 2007

https://doi.org/10.1075/li.30.1.07ekb

The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named Entities (NEs) into English. Three different models of the NER have been developed. A semi-supervised learning method has been adopted to develop the first two models, one without linguistic features (Model A) and the other with linguistic features (Model B). The third one (Model C) is based on statistical Hidden Markov Model. A modified joint-source channel model has been used along with a number of alternatives to generate the English transliterations of Bengali NEs and vice-versa. The transliteration models learn the mappings from the bilingual training sets optionally guided by linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their representations in English. The NER system has demonstrated the highest average Recall, Precision and F-Score values of 89.62%, 78.67% and 83.79% respectively in Model C. Evaluation of the proposed transliteration models demonstrated that the modified joint source-channel model performs best in terms of evaluation metrics for person and location names for both Bengali to English (B2E) transliteration and English to Bengali transliteration (E2B). The use of the linguistic knowledge during training of the transliteration models improves performance.

Cited by (19)

Cited by 19 other publications

Order by:

Guntha, Ramesh, Aiswarya A & Maya Presannakumar

2024. 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C), ► pp. 32 ff.

Rashid, Mohammad Rifat Ahmmad, Kazi Ferdous Hasan, Rakibul Hasan, Aritra Das, Mithila Sultana & Mahamudul Hasan

2024. A comprehensive dataset for sentiment and emotion classification from Bangladesh e-commerce reviews. Data in Brief 53 ► pp. 110052 ff.

Das Dawn, Debapratim, Abhinandan Khan, Soharab Hossain Shaikh & Rajat Kumar Pal

2023. A dictionary based model for bengali document classification. Applied Intelligence 53:11 ► pp. 14023 ff.

Jimmy, Laishram, Kishorjit Nongmeikappam & Sudip Kumar Naskar

2023. BiLSTM-CRF Manipuri NER with Character-Level Word Representation. Arabian Journal for Science and Engineering 48:2 ► pp. 1715 ff.

Harish, B. S. & R. Kasturi Rangan

2020. A comprehensive survey on Indian regional language processing. SN Applied Sciences 2:7

Biswas, Sitanath, Sujata Dash & Sweta Acharya

2019. Firefly Algorithm Based Multilingual Named Entity Recognition for Indian Languages. In Advanced Informatics for Computing Research [Communications in Computer and Information Science, 955], ► pp. 540 ff.

Prabhakar, Dinesh Kumar & Sukomal Pal

2018. Machine transliteration and transliterated text retrieval: a survey. Sādhanā 43:6

Ekbal, Asif, Sriparna Saha & Utpal Kumar Sikdar

2016. On active annotation for named entity recognition. International Journal of Machine Learning and Cybernetics 7:4 ► pp. 623 ff.

Khanam, M. Humera, Md.A. Khudhus & M.S. Prasad Babu

2016. 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), ► pp. 940 ff.

Saha, Sriparna & Asif Ekbal

2013. Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data & Knowledge Engineering 85 ► pp. 15 ff.

Ekbal, Asif, Sriparna Saha & Dhirendra Singh

2012. Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ► pp. 180 ff.

Ekbal, Asif, Sriparna Saha & Dhirendra Singh

2012. 2012 Third International Conference on Emerging Applications of Information Technology, ► pp. 331 ff.

Nongmeikapam, Kishorjit, Tontang Shangkhunem, Ngariyanbam Mayekleima Chanu, Laisuhram Newton Singh, Bishworjit Salam & Sivaji Bandyopadhyay

2011. 2011 2nd National Conference on Emerging Trends and Applications in Computer Science, ► pp. 1 ff.

Ekbal, Asif & Sriparna Saha

2010. Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition. Research on Language and Computation 8:1 ► pp. 73 ff.

Ekbal, Asif & Sriparna Saha

2011. A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies. Expert Systems with Applications 38:12 ► pp. 14760 ff.

Ekbal, Asif & Sriparna Saha

2012. Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition. International Journal on Document Analysis and Recognition (IJDAR) 15:2 ► pp. 143 ff.

Ekbal, Asif & Sriparna Saha

2013. Combining feature selection and classifier ensemble using a multiobjective simulated annealing approach: application to named entity recognition. Soft Computing 17:1 ► pp. 1 ff.

Ekbal, Asif & Sivaji Bandyopadhyay

2009. 2009 Seventh International Conference on Advances in Pattern Recognition, ► pp. 259 ff.

Ekbal, Asif & Sivaji Bandyopadhyay

2010. Named Entity Recognition in Bengali. Northern European Journal of Language Technology 1 ► pp. 26 ff.

This list is based on CrossRef data as of 25 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.