Stable and efficient video segmentation via GAN predicting adjacent frame

Ilnytskyi, Ivan

Home
→
Students Research & Project Works | Роботи студентів
→
Факультет прикладних наук
→
Освітня програма наук про дані
→
2019
→
View Item

dc.contributor.author	Ilnytskyi, Ivan
dc.date.accessioned	2019-02-18T15:58:50Z
dc.date.available	2019-02-18T15:58:50Z
dc.date.issued	2019
dc.identifier.citation	Ilnytskyi, Ivan. Stable and efficient video segmentation via GAN predicting adjacent frame : Master Thesis : manuscript / Ivan Ilnytskyi ; Supervisor Pavel Akapian ; Ukrainian Catholic University, Department of Computer Sciences. – Lviv : [s.n.], 2019. – 24 p. : ill.	uk
dc.identifier.uri	http://er.ucu.edu.ua/handle/1/1328
dc.language.iso	en	uk
dc.subject	GAN predicting adjacent frame	uk
dc.subject	neural networks	uk
dc.subject	semantic segmentation problem	uk
dc.title	Stable and efficient video segmentation via GAN predicting adjacent frame	uk
dc.type	Preprint	uk
dc.status	Публікується вперше	uk
dc.description.abstracten	Analyzing video streams represents a huge problem not only in terms of accuracy and speed, but also consistency of analysis between adjacent frames as videos are consistent due to real-world nature. Jittering effect of predictions is easily noticed by human vision in video semantic segmentation tasks. But it is not usually taken into account by design of algorithms as being suited for single image recognition and lack of easy solution via classical filters. This jittering leads to quite negative human assessment of algorithms while being good at accuracy. In addition it may lead to unstable or conflicting behavior of control systems that use computer vision. We propose the methods of efficient video semantic segmentation that take into account video consistency and can be implemented without annotated video dataset. Some methods require annotated photo only dataset, other methods additionally use generative adversarial network trained on relevant video dataset with no supervision. The solution is relevant for cases when the domain does not contain large annotated video datasets, but there are available annotated photo datasets and significantly large unlabeled videos. We show that using semantic segmentation mask of previous frame as a feature for current frame segmentation improves accuracy and consistency. We achieve best results using the network trained with features obtained from GAN and baseline segmentation network.	uk