Microbial communities play a vital role in most processes in the biosphere and are essential for solving present and future environmental challenges. Examples include the impact of the human microbiome on health and disease, the discovery of new antibiotics, and turning waste products into valuables. In essence, life would rapidly cease to exist in a world without microbes. However, given the large difficulties in isolating individual microbes, the majority of microbes remain undescribed; also known as the “microbial dark matter”. In the past 10 years, new methods have revolutionized our access to the genomes of microbial dark matter and have sparked an explosion of new fundamental discoveries based on genomic evidence.
Despite the fundamental discoveries enabled by new methods, we are still far from having a meaningful genomic representation of the tree of life. This can be underlined by the fact that the microbial species diversity in nature is vast, with estimates of millions to billions or even trillions of species. This is in stark contrast to the 47,894 prokaryotic species in the genome databases(GTDB v. 202). In our current Villum Synergy Initiator project we have demonstrated how the integration of state-of-the-art long-read DNA sequencing and graph-based deep learning can go beyond the current state of the art in bacterial genome recovery. Our new platform has the potential to supercharge genome recovery of microbial dark matter, and in this project, we want to realize that potential. Furthermore, we will develop novel tools to handle microbial genome data at an unprecedented scale and their integration with heterogeneous environmental data to enable an understanding of how biodiversity is shaped by the environment.
Therefore, the objectives of the DarkScience project are to develop new methods and approaches that enable access and analysis of microbial dark matter and their interaction with the environment.
1: Develop new binning features to separate microbial genomes by species-specific modified nucleotide signatures in the raw Nanopore DNA sequencing signal.
2: Supercharge microbial genome recovery by the development of novel graph-based machine learning binning frameworks.
3: Enable novel perspectives of analysis and make data and results accessible through the integration of heterogeneous external data with bacterial genome data at scale.
4: Identify natural drives of bacterial biodiversity by exploring hidden links between microbial genome diversity and environmental data.