Data-Driven Approach to Automated Hypernym Hierarchy Construction for the  Ukrainian WordNet

Romanyshyn, Nataliia

Home
→
Students Research & Project Works | Роботи студентів
→
Факультет прикладних наук
→
Бакалаврська програма "Комп'ютерні науки"
→
2023
→
View Item

dc.contributor.author	Romanyshyn, Nataliia
dc.date.accessioned	2024-02-14T08:32:29Z
dc.date.available	2024-02-14T08:32:29Z
dc.date.issued	2023
dc.identifier.citation	Romanyshyn, Nataliia. Data-Driven Approach to Automated Hypernym Hierarchy Construction for the Ukrainian WordNet / Nataliia Romanyshyn; Supervisor: Dmytro Chaplynskyi, Mariana Romanyshyn; Ukrainian Catholic University, Department of Computer Sciences. – Lviv: 2023. – 48 p.: ill.	uk
dc.identifier.uri	https://er.ucu.edu.ua/handle/1/4390
dc.language.iso	en	uk
dc.title	Data-Driven Approach to Automated Hypernym Hierarchy Construction for the Ukrainian WordNet	uk
dc.type	Preprint	uk
dc.status	Публікується вперше	uk
dc.description.abstracten	WordNet is a valuable resource in the field of linguistics and natural language processing, providing a structured and comprehensive list of lexico-semantic relations among words in a language. Automatic approaches for constructing and expanding WordNets are gaining popularity due to the high cost associated with manual taxonomy creation. Unfortunately, the existing work on constructing the Ukrainian WordNet has been limited in scale and availability to the public, and it primarily focused on manual creation. This thesis aims to create a basis for the Ukrainian WordNet automatically, focusing on hypo-hypernym relations, which reflect the hierarchical structure of WordNet. The presented approach leverages the linking between Princeton WordNet (PWN) and Wikidata and multilingual resources from Wikipedia, which allowed to map 17% of PWN to Ukrainian Wiki. Three strategies for generating candidate words to fill the gaps in the constructed WordNet basis are proposed, including machine translation, the Hypernym Discovery model, and Hypernym Instruction-Following LLaMA. The latter model achieves high-performance results on the selected MOC metric (41.61%). Finally, an annotation tool is developed to enable lexicographers to review and edit the candidates generated by our methods to improve the coherence of the Ukrainian WordNet. Overall, this work is an important step towards bridging the WordNet gap in the Ukrainian language. With the proposed approach that combines automated techniques with expert human input, we provide a reliable basis for creating Ukrainian WordNet resource.	uk