Local cover image
Local cover image
Image from Google Jackets

Extraction of multiword expressions from hindi text document

By: Material type: TextTextPublication details: Gurgaon BML Munjal University 2022Description: 109pSubject(s): DDC classification:
  • 006.3 MIS
Online resources: Dissertation note: Thesis submitted in the fulfillment of the requirement for the degree of Doctor of Philosophy by Atul Mishra Under the supervision of Dr. Soharab Hossain Shaikh, Prof. (Dr.) Ratna Sanyal Doctor of Philosophy 2022 Summary: Multiword expressions (MWEs) are a significant challenge in many fields of language technology. Multiword extraction from random text data has grown in popularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated Multiword extraction system. The overall contribution of the thesis has been divided into six parts. In this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy (i.e., the structure of linguistic patterns and association) and context connection between their constituent words. Various combination strategies of different classifiers based on these properties may be applied to develop a multiword extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. There are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. Another issue is to combine multiple filters since different combination strategies may be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. The methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available. The experiment results reflect the validity and viability of the method and help make a blueprint that shows how well it can work with the current procedures. A comparative study between the performance of previous works and the proposed methods has also been given. At the end of the thesis, the conclusion of the whole dissertation is reported.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)

Thesis submitted in the fulfillment of the requirement for the degree of Doctor of Philosophy by Atul Mishra Under the supervision of Dr. Soharab Hossain Shaikh, Prof. (Dr.) Ratna Sanyal Doctor of Philosophy 2022

Multiword expressions (MWEs) are a significant challenge in many fields of language technology. Multiword extraction from random text data has grown in popularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated Multiword extraction system. The overall contribution of the thesis has been divided into six parts. In this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy (i.e., the structure of linguistic patterns and association) and context connection between their constituent words. Various combination strategies of different classifiers based on these properties may be applied to develop a multiword extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. There are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. Another issue is to combine multiple filters since different combination strategies may be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. The methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available. The experiment results reflect the validity and viability of the method and help make a blueprint that shows how well it can work with the current procedures. A comparative study between the performance of previous works and the proposed methods has also been given. At the end of the thesis, the conclusion of the whole dissertation is reported.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image

Powered by Koha