Integrating Semantic Parsing with Dependency Parsing for Malayalam: A Framework for Enhanced Syntactic and Semantic Understanding

International Journal of Electronics and Communication Engineering
© 2025 by SSRG - IJECE Journal
Volume 12 Issue 6
Year of Publication : 2025
Authors : P.V. Ajusha, A.P. Ajees
pdf
How to Cite?

P.V. Ajusha, A.P. Ajees, "Integrating Semantic Parsing with Dependency Parsing for Malayalam: A Framework for Enhanced Syntactic and Semantic Understanding," SSRG International Journal of Electronics and Communication Engineering, vol. 12,  no. 6, pp. 44-53, 2025. Crossref, https://doi.org/10.14445/23488549/IJECE-V12I6P104

Abstract:

The morphological complexity of the Malayalam language poses significant challenges for dependency parsing, demanding accurate syntactic and semantic analysis to advance natural language processing (NLP) for low-resource languages. This study introduces a dependency parsing approach that combines the Cross-Lingual Language Model with Roberta (XLM Roberta) as the shared encoder and a biaffine attention mechanism for parsing with a span-based predictor for SRL. XLM Roberta is a transformer-based multilingual model that produces high-dimensional contextual embeddings for Malayalam sentences to provide a robust syntactic and semantic analysis foundation. The biaffine attention mechanism is employed by the dependency parsing decoder to predict head-dependent relationships and assign syntactic dependency labels. The Span-Based Predictor employed for SRL assigns semantic roles to spans within sentences to effectively handle long-range dependencies common in complex languages like Malayalam. The dataset comprises a manually annotated Malayalam treebank, ensuring complete syntactic and semantic coverage. Parsing performance was evaluated using head detection accuracy, root token identification and the processing of complex sentence structures. Evaluation results indicate that integrating morphological features improves the Unlabeled Attachment Score (UAS) from 93.70% to 95.20% and the Labeled Attachment Score (LAS) from 91.45% to 93.10%. Furthermore, head detection accuracy, root token identification and complex sentence parsing demonstrate significant improvements, with respective scores increasing to 95.40%, 93.80% and 91.60%. By addressing major challenges in Malayalam dependency parsing, this study presents an efficient and scalable solution for language processing tasks. The proposed approach demonstrates significant potential for applications like machine translation, sentiment analysis and knowledge extraction, paving the way for future developments in NLP for low-resource and morphologically rich languages.

Keywords:

Natural language processing, XLM-Roberta, Malayalam dependency parsing, Biaffine attention mechanism, Semantic role labelling.

References:

[1] Artur Kulmizev, and Joakim Nivre, “Schrödinger's Tree-On Syntax and Neural Language Models,” Frontiers in Artificial Intelligence, vol. 5, pp. 1-14, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Gayathri G. Krishnan, “Malayalam Morphosyntax: Inflectional Features and their Acquisition,” Thesis, Indian Institute of Technology Bombay, pp. 1-275, 2020.
[Google Scholar] [Publisher Link]
[3] Haitao Liu, Chunshan Xu, and Junying Liang, “Dependency Distance: A New Perspective on Syntactic Patterns in Natural Languages,” Physics of Life Reviews, vol. 21, pp. 171-193, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Abhilasha A. Kumar, “Semantic Memory: A Review of Methods, Models, and Current Challenges,” Psychonomic Bulletin & Review, vol. 28, pp. 40-80, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Aishwarya Kamath, and Rajarshi Das, “A Survey on Semantic Parsing,” Arxiv, pp. 1-22, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Carlos Gómez-Rodríguez, Iago Alonso-Alonso, and David Vilares, “How Important is Syntactic Parsing Accuracy? An Empirical Evaluation on Rule-based Sentiment Analysis,” Artificial Intelligence Review, vol. 52, pp. 2081-2097, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] S. Sakthi Vel, and R. Priya, “A Translation Framework for Cross Language Information Retrieval in Tamil and Malayalam,” Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 12, no. 2, pp. 319-332, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Ziyu Yao et al., “Model-Based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 5447-5458, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Muhammad Khalifa, Hesham Hassan, and Aly Fahmy, “Zero-Resource Multi-Dialectal Arabic Natural Language Understanding,” Arxiv, pp. 1-15, 2021.
[CrossRef] [Publisher Link]
[10] Diellza Nagavci Mati, Mentor Hamiti, and Elissa Mollakuqe, “Morphological Tagging and Lemmatization in the Albanian Language,” Seeu Review, vol. 18, no. 2, pp. 4-16, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Ran Zmigrod, Tim Vieira, and Ryan Cotterell, “Please Mind the Root: Decoding Arborescences for Dependency Parsing,” Arxiv, pp. 111, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Sungjoon Park et al., “KLUE: Korean Language Understanding Evaluation,” Arxiv, pp. 1-76, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Yige Chen et al, “Yet Another Format of Universal Dependencies for Korean,” Arxiv, pp. 1-6, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Brendon Albertson, “TextMix: Using NLP and APIs to Generate Chunked Sentence Scramble Tasks,” 29th Conference CALL and Professionalisation: Short Papers from EUROCALL 2021, pp. 6-11, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Ran Zmigrod, Tim Vieira, and Ryan Cotterell, “On Finding the K-Best Non-Projective Dependency Trees,” Arxiv, pp. 1-14, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Jivnesh Sandhan et al., “TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer,” Arxiv, pp. 1-11, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[17] C.S. Ayush Kumar et al., “BERT-Based Sequence Labelling Approach for Dependency Parsing in Tamil,” Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, Dublin, Ireland, pp. 1-8, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Zhengqiao Zeng et al., “A Conspiracy Theory Text Detection Method Based on RoBERTa and XLM-RoBERTa Models,” Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, pp. 1-5, 2024.
[Google Scholar] [Publisher Link]
[19] Jiangzhou Ji, Yaohan He, and Jinlong Li, “A Biaffine Attention-Based Approach for Event Factor Extraction,” Conference Proceedings 6th China Conference on Knowledge Graph and Semantic Computing, Guangzhou, China, pp. 1-10, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Rexhina Blloshmi et al., “Generating Senses and Roles: An End-to-End Model for Dependency-and Span-based Semantic Role Labeling,” Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence,” pp. 3786-3793, 2021.
[Google Scholar] [Publisher Link]