dc.description.abstracten |
WordNet is a valuable resource in the field of linguistics and natural language processing,
providing a structured and comprehensive list of lexico-semantic relations among words in
a language. Automatic approaches for constructing and expanding WordNets are gaining
popularity due to the high cost associated with manual taxonomy creation. Unfortunately, the
existing work on constructing the Ukrainian WordNet has been limited in scale and availability
to the public, and it primarily focused on manual creation. This thesis aims to create a
basis for the Ukrainian WordNet automatically, focusing on hypo-hypernym relations, which
reflect the hierarchical structure of WordNet. The presented approach leverages the linking
between Princeton WordNet (PWN) and Wikidata and multilingual resources from Wikipedia,
which allowed to map 17% of PWN to Ukrainian Wiki. Three strategies for generating
candidate words to fill the gaps in the constructed WordNet basis are proposed, including
machine translation, the Hypernym Discovery model, and Hypernym Instruction-Following
LLaMA. The latter model achieves high-performance results on the selected MOC metric
(41.61%). Finally, an annotation tool is developed to enable lexicographers to review and edit
the candidates generated by our methods to improve the coherence of the Ukrainian WordNet.
Overall, this work is an important step towards bridging the WordNet gap in the Ukrainian
language. With the proposed approach that combines automated techniques with expert human
input, we provide a reliable basis for creating Ukrainian WordNet resource. |
uk |