dc.description.abstracten |
Recent studies have highlighted the exceptional capabilities of open-sourced foun-
dational models like LLaMA, Mistral, and Gemma, particularly in scenarios requir-
ing writing assistance. These models demonstrate proficiency in various tasks both
in zero-shot settings and when fine-tuned with task-specific, instruction-driven data.
Despite their adaptability, the application of these models to Grammatical Error Cor-
rection (GEC) tasks, critical for producing grammatically accurate text in writing
assistants, remains underexplored. This thesis explores the performance of open-
sourced Large Language Models (LLMs) in GEC task across multiple setups: zero-
shot, supervised fine-tuning, and Reinforcement Learning from Human Feedback
(RLHF). Our research shows that task-specific fine-tuning significantly enhances
LLM performance on GEC tasks. We also highlight the importance of precise prompt
configuration in zero-shot settings to align models with the specific requirement
of the CoNLL-2014 and BEA-2019 benchmarks, aiming for minimal necessary ed-
its. Further, our experiments with RLHF, particularly Direct Preference Optimiza-
tion, provide insights into aligning LLMs for specific applications, showing an im-
provement of 0.3% in scores and indicating a further path for improvement. The
best-performing model, Chat-LLaMA-2-13B-FT, matched the performance of state-
of-the-art models with considerably less data, achieving an F0.5 score of 67.87% on
the CoNLL-2014-test and 73.11% on the BEA-2019-test benchmarks. This thesis ex-
pands our understanding of the capabilities of open-sourced LLMs in GEC and sets
the stage for future enhancements in this area. The code and trained model are pub-
licly available. |
uk |