Unsupervised text simplification using neural style transfer

Показати скорочений опис матеріалу

dc.contributor.author Kariuk, Oleg
dc.date.accessioned 2020-02-25T08:25:27Z
dc.date.available 2020-02-25T08:25:27Z
dc.date.issued 2020
dc.identifier.citation Kariuk, Oleg. Unsupervised text simplification using neural style transfer : Master Thesis : manuscript / Oleg Kariuk ; Supervisor Dima Karamshuk ; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2020. – 48 p. : ill. uk
dc.identifier.uri http://er.ucu.edu.ua/handle/1/2045
dc.language.iso uk uk
dc.subject Unsupervised text simplification uk
dc.subject Unsupervised style transfer uk
dc.subject Neural style transfer uk
dc.title Unsupervised text simplification using neural style transfer uk
dc.type Preprint uk
dc.status Публікується вперше uk
dc.description.abstracten With the growing interdependence of the world economies, cultures and populations the advantages of learning foreign languages are becoming more than ever apparent. The growing internet and mobile phone user base provides significant opportunities for online language learning, the global market size of which is forecasted to increase by almost $17.9 bn during 2019-2023. One of the most effective ways to better oneself in a foreign language is through reading. Graded readers — the books in which the original text is simplified to lower grades of complexity — make the process of reading in a foreign language less daunting. Composing a Graded reader is a laborious manual process. There are two possible ways to computerize the process of writing Graded readers for arbitrary input texts. The first one lies in utilizing a variation of supervised sequence-to-sequence models for text simplification. Such models depend on scarcely available parallel text corpora, the datasets in which every text piece is available in the original and simplified versions. An alternative unsupervised approach lies in applying neural style transfer techniques where an algorithm can learn to decompose a given text into vector representations of its content and style and to generate a new version of the same content in a simplified language style. In this work, we demonstrate the feasibility of applying unsupervised learning to the problem of text simplification by using cross-lingual language modeling. It allows us to improve the previous best BLEU score from 88.85 to 96.05 for the Wikilarge dataset in unsupervised fashion, and SARI score from 30 to 43.18 and FKGL from 4.01 to 3.58 for the Newsela dataset in semi-supervised one. Apart from that, we propose new penalties that provide more control during beam search generation. uk


Долучені файли

Даний матеріал зустрічається у наступних зібраннях

Показати скорочений опис матеріалу

Пошук


Перегляд

Мій обліковий запис