Показати скорочений опис матеріалу
dc.contributor.author | Antentyk, Yurii | |
dc.date.accessioned | 2024-08-22T09:41:22Z | |
dc.date.available | 2024-08-22T09:41:22Z | |
dc.date.issued | 2024 | |
dc.identifier.citation | Antentyk Yurii. Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function. Ukrainian Catholic University, Faculty of Applied Sciences, Department of Computer Sciences. Lviv 2024, 41 p. | uk |
dc.identifier.uri | https://er.ucu.edu.ua/handle/1/4661 | |
dc.language.iso | en | uk |
dc.subject | Extracting Text Representation | uk |
dc.subject | Pretrained Language Models | uk |
dc.subject | Distance-based Loss Function | uk |
dc.title | Extracting Text Representation from Pretrained Language Models via Edit Distance-based Loss Function | uk |
dc.type | Preprint | uk |
dc.status | Публікується вперше | uk |
dc.description.abstracten | Transformers have proven themselves as versatile architecture for a wide range of NLP tasks. Because of unsupervised pre-training on large text corpus and a drastic increase in a number of parameters, both encoder and decoder architectures were able to generalize well enough to show emergent abilities and achieve SOTA results in many downstream tasks by prompting or fine-tuning. This makes pre-trained language models tempting to be used for sentence em- bedding, as the meaningful fixed-vector representation of text is crucial for good performance of such tasks as text similarity, semantic search, etc. The topic has been extensively explored, and the most successful approaches can be viewed as different variations of extracting sentence representations from the hidden states of language models, which have been fine-tuned using domain-specific datasets. On the other hand, other works try to extract the embedding vector from the pre-trained language model without altering its parameters, which opens the way to turn the pre-trained language model into an embedding model without fine-tuning. This is done by optimising the latent reparameterised sentence space, which is then used as additional context while decoding the original sentence. While showing promising recoverability results, this approach has been shown to suffer from expo- sure bias, a discrepancy between the distribution of sequences that were observed and generated by the model. This work aims to study the way to condition a pre-trained language model with a latent variable as well as to mitigate the exposure bias by incorporating Opti- mal Completion Distillation loss, an alternative to Maximum Likelihood Estimation, which minimises the edit distance between a sampled text from the model and a ground truth sentences. | uk |