Abstract

Current state-of-the-art techniques for metage- nomic binning only utilize local features for the individual DNA sequences (contigs), neglecting additional information such as the assembly graph, in which the contigs are connected according to overlapping reads, and gene markers identified in the contigs. In this paper, we propose the use of a Variational AutoEncoder (VAE) tailored to leverage auxiliary structural information about contig relations when learning contig representations for subsequent metagenomic binning. Our method, CCVAE, improves on previous work that used VAEs for learning latent representations of the individual contigs, by constraining these representations according to the connectivity information from the assembly graph. Additionally, we incor- porate into the model additional information in the form of marker genes to better differentiate contigs from different genomes. Our experiments on both simulated and real-world datasets demon- strate that CCVAE outperforms current state-of- the-art techniques, thus providing a more effective method for metagenomic binning.
OriginalsprogEngelsk
TitelProceedings of the 40th International Conference on Machine Learning
RedaktørerAndreas Krause, Emma Brunskill
Publikationsdato2023
Sider18471–18481
Artikelnummer762
StatusUdgivet - 2023
BegivenhedICML'23: International Conference on Machine Learning - Honolulu, USA
Varighed: 23 jul. 202329 jul. 2023

Konference

KonferenceICML'23: International Conference on Machine Learning
Land/OmrådeUSA
ByHonolulu
Periode23/07/202329/07/2023
NavnThe Proceedings of Machine Learning Research
Vol/bind202
ISSN2640-3498

Fingeraftryk

Dyk ned i forskningsemnerne om 'Metagenomic Binning using Connectivity-constrained Variational Autoencoders'. Sammen danner de et unikt fingeraftryk.

Citationsformater