000 03130nam a22001937a 4500
003 BML
082 _a006.3
_bMIS
100 _aMishra, Atul
245 _aExtraction of multiword expressions from hindi text document
260 _aGurgaon
_bBML Munjal University
_c2022
300 _a109p.
502 _aThesis submitted in the fulfillment of the requirement for the degree of Doctor of Philosophy by Atul Mishra Under the supervision of Dr. Soharab Hossain Shaikh, Prof. (Dr.) Ratna Sanyal
_bDoctor of Philosophy
_d2022
520 _aMultiword expressions (MWEs) are a significant challenge in many fields of language technology. Multiword extraction from random text data has grown in popularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated Multiword extraction system. The overall contribution of the thesis has been divided into six parts. In this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy (i.e., the structure of linguistic patterns and association) and context connection between their constituent words. Various combination strategies of different classifiers based on these properties may be applied to develop a multiword extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. There are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. Another issue is to combine multiple filters since different combination strategies may be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. The methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available. The experiment results reflect the validity and viability of the method and help make a blueprint that shows how well it can work with the current procedures. A comparative study between the performance of previous works and the proposed methods has also been given. At the end of the thesis, the conclusion of the whole dissertation is reported.
650 _aEngineering and Technology
650 _aComputer Science Artificial Intelligence
856 _uhttps://shodhganga.inflibnet.ac.in/handle/10603/411302
856 _uhttp://drc.bml.edu.in:8080/jspui/handle/123456789/2835
942 _2ddc
_cTH
999 _c10140
_d10140