dc.description.abstract |
In recent years, we have seen an incredible increase in deep learning. With increasing
interest, we also have an increasing number of bold and even revolutionary studies
that drive progress and boost models performance. Despite the fact that the number
of large and high-quality data-sets is growing rapidly, we could often observe that
models need even more data for many domains and tasks. Usually, additional data
is needed not only for giant models. Even though the domains like autonomous
vehicles, which mainly focus on lightweight models, require extra data. We should
state that sometimes data labeling is not a panacea. Especially for autonomous vehicles,
as the data provided must have a great variety and low error risk. The additional
synthetic could be an excellent booster for existing approaches or even a
must-have part of training data. For example, simulators give the ability to manage
the scene’s complexity by controlling the number of objects, their size, and their interaction
with the environment, which could be very helpful for such tasks as object
detection. Nowadays, the researcher should intuitively balance the ratio of natural
and generated data simultaneously, considering the possibility of gaps between the
two domains. Despite the fact that the mentioned task is not evident, constraints like
model size and count of classes could bring additional unclarity. In this paper, we
precisely analyze the impact of synthetic data on the training process, cover possible
training strategies, and provide guidance on defining the amount of artificial data
with existing constraints. |
uk |