Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function

Show simple item record

dc.contributor.author Antentyk, Yurii
dc.date.accessioned 2024-08-22T09:41:22Z
dc.date.available 2024-08-22T09:41:22Z
dc.date.issued 2024
dc.identifier.citation Antentyk Yurii. Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function. Ukrainian Catholic University, Faculty of Applied Sciences, Department of Computer Sciences. Lviv 2024, 41 p. uk
dc.identifier.uri https://er.ucu.edu.ua/handle/1/4661
dc.language.iso en uk
dc.subject Extracting Text Representation uk
dc.subject Pretrained Language Models uk
dc.subject Distance-based Loss Function uk
dc.title Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function uk
dc.type Preprint uk
dc.status Публікується вперше uk
dc.description.abstracten Transformers have proven themselves as versatile architecture for a wide range of NLP tasks. Because of unsupervised pre-training on large text corpus and a drastic increase in a number of parameters, both encoder and decoder architectures were able to generalize well enough to show emergent abilities and achieve SOTA results in many downstream tasks by prompting or fine-tuning. This makes pre-trained language models tempting to be used for sentence em- bedding, as the meaningful fixed-vector representation of text is crucial for good performance of such tasks as text similarity, semantic search, etc. The topic has been extensively explored, and the most successful approaches can be viewed as different variations of extracting sentence representations from the hidden states of language models, which have been fine-tuned using domain-specific datasets. On the other hand, other works try to extract the embedding vector from the pre-trained language model without altering its parameters, which opens the way to turn the pre-trained language model into an embedding model without fine-tuning. This is done by optimising the latent reparameterised sentence space, which is then used as additional context while decoding the original sentence. While showing promising recoverability results, this approach has been shown to suffer from expo- sure bias, a discrepancy between the distribution of sequences that were observed and generated by the model. This work aims to study the way to condition a pre-trained language model with a latent variable as well as to mitigate the exposure bias by incorporating Opti- mal Completion Distillation loss, an alternative to Maximum Likelihood Estimation, which minimises the edit distance between a sampled text from the model and a ground truth sentences. uk


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Browse

My Account