Context-Based Question-Answering System for the Ukrainian Language

Tiutiunnyk, Serhii

Home
→
Students Research & Project Works | Роботи студентів
→
Факультет прикладних наук
→
Освітня програма наук про дані
→
2020
→
View Item

dc.contributor.author	Tiutiunnyk, Serhii
dc.date.accessioned	2020-01-28T08:58:49Z
dc.date.available	2020-01-28T08:58:49Z
dc.date.issued	2020
dc.identifier.citation	Tiutiunnyk, Serhii . Context-Based Question-Answering System for the Ukrainian Language : Master Thesis : manuscript rights / Serhii Tiutiunnyk ; Supervisor Vsevolod Dyomkin ; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2020. – 29 p. : ill.	uk
dc.identifier.uri	http://er.ucu.edu.ua/handle/1/1898
dc.language.iso	en	uk
dc.subject	Context-Based Question-Answering System	uk
dc.subject	Long short-term memory model	uk
dc.subject	BERT base model	uk
dc.title	Context-Based Question-Answering System for the Ukrainian Language	uk
dc.type	Preprint	uk
dc.status	Публікується вперше
dc.description.abstracten	This work presents a context-based question answering model for the Ukrainian language based on Wikipedia articles using Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018) model, which takes a context (Wikipedia article) and a question to the context. The result of the model is an answer to the question. The model consists of two parts. The ﬁrst one is a pre-trained multilingual BERT model, which is trained on the top-100, the most popular languages on Wikipedia articles. The second part is the ﬁne-tuned model, which is trained on the dataset of questions and answers to the Wikipedia articles. The training and validation data is Stanford Question Answering Dataset (SQuAD) (Rajpurkar et al., 2016). There are no question answering datasets for the Ukrainian language. The plan is to build an appropriate dataset with machine translation and use it for the ﬁne-tuning training stage and compare the result with models which were ﬁne-tunedon the other languages. The next experiment is to train a model on the Slavic language datasets before ﬁne-tuning on the Ukrainian language and compare the results.	uk