TY - GEN
T1 - Variational Autoencoders for Pedestrian Synthetic Data Augmentation of Existing Datasets
T2 - 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISAPP 2024
AU - Nikolov, Ivan Adriyanov
N1 - Conference code: 19
PY - 2024/3/13
Y1 - 2024/3/13
N2 - The requirements for more and more data for training deep learning surveillance and object detection models have resulted in slower deployment and more costs connected to dataset gathering, annotation, and testing. One way to help with this is the use of synthetic data giving more varied scenarios and not requiring manual annotation. We present our initial exploratory work in generating synthetic pedestrian augmentations for an existing dataset through the use of variational autoencoders. Our method consists of creating a large number of backgrounds and training a variational autoencoder on a small subset of annotated pedestrians. We then interpolate the latent space of the autoencoder to generate variations of these pedestrians, calculate their positions on the backgrounds, and blend them to create new images. We show that even though we do not achieve as good results as just adding more real images, we can boost the performance and robustness of a YoloV5 model trained on a mix of real and small amounts of synthetic images. As part of this paper, we also propose the next steps to expand this approach and make it much more useful for a wider array of datasets.
AB - The requirements for more and more data for training deep learning surveillance and object detection models have resulted in slower deployment and more costs connected to dataset gathering, annotation, and testing. One way to help with this is the use of synthetic data giving more varied scenarios and not requiring manual annotation. We present our initial exploratory work in generating synthetic pedestrian augmentations for an existing dataset through the use of variational autoencoders. Our method consists of creating a large number of backgrounds and training a variational autoencoder on a small subset of annotated pedestrians. We then interpolate the latent space of the autoencoder to generate variations of these pedestrians, calculate their positions on the backgrounds, and blend them to create new images. We show that even though we do not achieve as good results as just adding more real images, we can boost the performance and robustness of a YoloV5 model trained on a mix of real and small amounts of synthetic images. As part of this paper, we also propose the next steps to expand this approach and make it much more useful for a wider array of datasets.
KW - Dataset Augmentation
KW - Object Detection
KW - Surveillance
KW - Synthetic Data
KW - Variational Autoencoders
UR - http://www.scopus.com/inward/record.url?scp=85192153029&partnerID=8YFLogxK
U2 - 10.5220/0012570700003660
DO - 10.5220/0012570700003660
M3 - Article in proceeding
VL - 2
SP - 829
EP - 836
BT - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
PB - SCITEPRESS Digital Library
Y2 - 27 February 2024 through 29 February 2024
ER -