Bibliographic description:
Tarnavskyi, Maksym. Improving Sequence Tagging for Grammatical Error Correction / Maksym Tarnavskyi; Supervisor: Kostiantyn Omelianchuk; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2021. – 52 p.: ill.
Abstract:
In this work, we investigated the recent sequence tagging approach for the Grammatical
Error Correction task. We compared the impact of different transformerbased
encoders of base and large configurations and showed the influence of tags’
vocabulary size. Also, we discovered ensembling methods on data and model levels.
We proposed two methods for selecting better quality data and filtering noisy
data. We generated new training GEC data based on knowledge distillation from an
ensemble of models and discovered strategies for its usage. Our best ensemble without
pre-training on the synthetic data achieves a new SOTA result of an F0.5 76.05 on
BEA-2019 (test), in contrast, when the newest obtained results were achieved with
pre-training on synthetic data. Our best single model with pre-training on synthetic
data achieves F0.5 of 73.21 on BEA-2019 (test). Our investigation improved the previous
results by 0.8/2.45 points for the single/ensemble sequence tagging models.
The code, generated datasets, and trained models are publicly available.