Improving Sequence Tagging for Grammatical Error Correction

Tarnavskyi, Maksym

Home
→
Students Research & Project Works | Роботи студентів
→
Факультет прикладних наук
→
Освітня програма наук про дані
→
2021
→
View Item

dc.contributor.author	Tarnavskyi, Maksym
dc.date.accessioned	2021-06-30T09:30:41Z
dc.date.available	2021-06-30T09:30:41Z
dc.date.issued	2021
dc.identifier.citation	Tarnavskyi, Maksym. Improving Sequence Tagging for Grammatical Error Correction / Maksym Tarnavskyi; Supervisor: Kostiantyn Omelianchuk; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2021. – 52 p.: ill.	uk
dc.identifier.uri	https://er.ucu.edu.ua/handle/1/2707
dc.description.abstract	In this work, we investigated the recent sequence tagging approach for the Grammatical Error Correction task. We compared the impact of different transformerbased encoders of base and large configurations and showed the influence of tags’ vocabulary size. Also, we discovered ensembling methods on data and model levels. We proposed two methods for selecting better quality data and filtering noisy data. We generated new training GEC data based on knowledge distillation from an ensemble of models and discovered strategies for its usage. Our best ensemble without pre-training on the synthetic data achieves a new SOTA result of an F0.5 76.05 on BEA-2019 (test), in contrast, when the newest obtained results were achieved with pre-training on synthetic data. Our best single model with pre-training on synthetic data achieves F0.5 of 73.21 on BEA-2019 (test). Our investigation improved the previous results by 0.8/2.45 points for the single/ensemble sequence tagging models. The code, generated datasets, and trained models are publicly available.	uk
dc.language.iso	en	uk
dc.subject	Grammatical Error Correction	uk
dc.subject	sequence tagging approach	uk
dc.subject	data augmentation techniques	uk
dc.title	Improving Sequence Tagging for Grammatical Error Correction	uk
dc.type	Preprint	uk
dc.status	Публікується вперше	uk