Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function

Antentyk, Yurii

Домівка
→
Students Research & Project Works | Роботи студентів
→
Факультет прикладних наук
→
Освітня програма наук про дані
→
2024
→
Перегляд матеріалів

dc.contributor.author	Antentyk, Yurii
dc.date.accessioned	2024-08-22T09:41:22Z
dc.date.available	2024-08-22T09:41:22Z
dc.date.issued	2024
dc.identifier.citation	Antentyk Yurii. Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function. Ukrainian Catholic University, Faculty of Applied Sciences, Department of Computer Sciences. Lviv 2024, 41 p.	uk
dc.identifier.uri	https://er.ucu.edu.ua/handle/1/4661
dc.language.iso	en	uk
dc.subject	Extracting Text Representation	uk
dc.subject	Pretrained Language Models	uk
dc.subject	Distance-based Loss Function	uk
dc.title	Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function	uk
dc.type	Preprint	uk
dc.status	Публікується вперше	uk
dc.description.abstracten	Transformers have proven themselves as versatile architecture for a wide range of NLP tasks. Because of unsupervised pre-training on large text corpus and a drastic increase in a number of parameters, both encoder and decoder architectures were able to generalize well enough to show emergent abilities and achieve SOTA results in many downstream tasks by prompting or fine-tuning. This makes pre-trained language models tempting to be used for sentence em- bedding, as the meaningful fixed-vector representation of text is crucial for good performance of such tasks as text similarity, semantic search, etc. The topic has been extensively explored, and the most successful approaches can be viewed as different variations of extracting sentence representations from the hidden states of language models, which have been fine-tuned using domain-specific datasets. On the other hand, other works try to extract the embedding vector from the pre-trained language model without altering its parameters, which opens the way to turn the pre-trained language model into an embedding model without fine-tuning. This is done by optimising the latent reparameterised sentence space, which is then used as additional context while decoding the original sentence. While showing promising recoverability results, this approach has been shown to suffer from expo- sure bias, a discrepancy between the distribution of sequences that were observed and generated by the model. This work aims to study the way to condition a pre-trained language model with a latent variable as well as to mitigate the exposure bias by incorporating Opti- mal Completion Distillation loss, an alternative to Maximum Likelihood Estimation, which minimises the edit distance between a sampled text from the model and a ground truth sentences.	uk