Extraction of multiword expressions from hindi text document

By:

Mishra, Atul

Material type: Text

TextPublication details: Gurgaon BML Munjal University 2022Description: 109pSubject(s):

DDC classification:

006.3 MIS

Online resources:

Dissertation note: Thesis submitted in the fulfillment of the requirement for the degree of Doctor of Philosophy by Atul Mishra Under the supervision of Dr. Soharab Hossain Shaikh, Prof. (Dr.) Ratna Sanyal Doctor of Philosophy 2022 Summary: Multiword expressions (MWEs) are a significant challenge in many fields of language technology. Multiword extraction from random text data has grown in popularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated Multiword extraction system. The overall contribution of the thesis has been divided into six parts. In this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy (i.e., the structure of linguistic patterns and association) and context connection between their constituent words. Various combination strategies of different classifiers based on these properties may be applied to develop a multiword extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. There are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. Another issue is to combine multiple filters since different combination strategies may be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. The methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available. The experiment results reflect the validity and viability of the method and help make a blueprint that shows how well it can work with the current procedures. A comparative study between the performance of previous works and the proposed methods has also been given. At the end of the thesis, the conclusion of the whole dissertation is reported.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Collection	Shelving location	Call number	Materials specified	Status	Notes	Date due	Barcode
Thesis	BMU Library	Reference	Display-1	006.3 MIS (Browse shelf(Opens below))		Not For Loan	SOET		TH06

Browsing BMU Library shelves, Shelving location: Display-1, Collection: Reference Close shelf browser (Hides shelf browser)

	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	Next
	006.3 MIS Extraction of multiword expressions from hindi text document	006.3 PHU Development of stress induction and detection system to study its effect on brain	621.3 SAH Maximum power point tracking in photovoltaic systems using adaptive control	658 GAU Study to examine consumer legitimacy in the sharing economy context	Next

Thesis submitted in the fulfillment of the requirement for the degree of Doctor of Philosophy by Atul Mishra Under the supervision of Dr. Soharab Hossain Shaikh, Prof. (Dr.) Ratna Sanyal Doctor of Philosophy 2022

Multiword expressions (MWEs) are a significant challenge in many fields of language technology. Multiword extraction from random text data has grown in popularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated Multiword extraction system. The overall contribution of the thesis has been divided into six parts. In this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy (i.e., the structure of linguistic patterns and association) and context connection between their constituent words. Various combination strategies of different classifiers based on these properties may be applied to develop a multiword extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. There are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. Another issue is to combine multiple filters since different combination strategies may be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. The methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available. The experiment results reflect the validity and viability of the method and help make a blueprint that shows how well it can work with the current procedures. A comparative study between the performance of previous works and the proposed methods has also been given. At the end of the thesis, the conclusion of the whole dissertation is reported.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer