Extraction of multiword expressions from hindi text document
Material type: TextPublication details: Gurgaon BML Munjal University 2022Description: 109pSubject(s): DDC classification:- 006.3 MIS
Item type | Current library | Collection | Shelving location | Call number | Materials specified | Status | Notes | Date due | Barcode | |
---|---|---|---|---|---|---|---|---|---|---|
Thesis | BMU Library | Reference | Display-1 | 006.3 MIS (Browse shelf(Opens below)) | Not For Loan | SOET | TH06 |
Browsing BMU Library shelves, Shelving location: Display-1, Collection: Reference Close shelf browser (Hides shelf browser)
Thesis submitted in the fulfillment of the requirement for the degree of Doctor of Philosophy by Atul Mishra Under the supervision of Dr. Soharab Hossain Shaikh, Prof. (Dr.) Ratna Sanyal Doctor of Philosophy 2022
Multiword expressions (MWEs) are a significant challenge in many fields of language technology. Multiword extraction from random text data has grown in popularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated Multiword extraction system. The overall contribution of the thesis has been divided into six parts. In this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy (i.e., the structure of linguistic patterns and association) and context connection between their constituent words. Various combination strategies of different classifiers based on these properties may be applied to develop a multiword extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. There are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. Another issue is to combine multiple filters since different combination strategies may be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. The methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available. The experiment results reflect the validity and viability of the method and help make a blueprint that shows how well it can work with the current procedures. A comparative study between the performance of previous works and the proposed methods has also been given. At the end of the thesis, the conclusion of the whole dissertation is reported.
There are no comments on this title.