Metagenomics Analysis Pipeline

Written by tdarde



Bioinformatics | Metagenomics



October 16, 2022

Metagenomics – SciLicium Analysis Pipeline

Bioinformatics pipelines are increasingly used to conduct metagenomics analyses. In many cases, you will need bioinformatics skills, computing resources, and an understanding of existing analytical processes. Guidelines for each pipeline include configuration profiles, reference databases, and recommended analytical steps. In the absence of a formal evaluation, selecting a pipeline and an adequate set of parameters and algorithms can quickly become a brainteaser. Our goal in this article is to provide you with information about our metagenomics analysis pipeline so that you can make an informed decision.

What is metagenomics?

In metagenomics, genetic material is recovered directly from environmental samples. It eliminates the need to culture microorganisms for extracting and sequencing their genomes. In contrast, metagenomics relies on DNA sequencing to produce genome-wide data sets that can provide insights into microbial communities, including rare or novel species.

There are two main types of metagenomics:

Targeted metagenomics (16s rRNA, 18s rRNA, ITS regions)

16S rRNA gene sequencing targets and reads a region of the 16S rRNA gene which is found in all Bacteria and Archaea. This type of sequencing can only identify these types of microorganisms. Other types of amplicon sequencing can identify other microorganisms, such as ITS sequencing for fungi or 18S sequencing for protists.

Prokaryotic 16S ribosomal RNA (rRNA) is often employed for metagenomic analysis of microbial populations. There are conserved and variable regions in these genes that can be used for phylogenetic classification. By using 16S rDNA sequencing, molecular identification is applied fundamentally to bacteria that are difficult, impossible or take a long time to identify with other types of techniques.

Shotgun Metagenomics

Metagenomic shotgun sequencing involves randomly breaking DNA into many small pieces, much like a shotgun would do. These fragmented pieces of DNA are then sequenced and their DNA sequences are stitched back together using bioinformatics to identify the species and genes present in the sample. For microbiome studies, this means that shotgun sequencing can identify and profile bacteria, fungi, viruses and many other types of microorganisms.

The Shotgun Metagenomic method can be applied for comprehensively sampling every gene of every organism present in a given complex sample. This method is used for assessing bacterial diversity and measuring the abundance of microbes. Using shotgun metagenomics, it is also possible to study microorganisms that are not culturable or difficult to analyze otherwise.

Targeted metagenomics VS Shotgun Metagenomics

As a quick recap, Targeted Metagenomics amplify target regions (such as 16S rRNA) by PCR, while Shotgun Metagenomics sequence overlapping regions of the genome by randomly selecting primers. While taxonomic classification using shotgun metagenomics is more accurate, it is more costly, requires extensive data analysis, and may require more sequencing for taxonomic classification of low-abundance genomes. Microbiome studies have overwhelmingly used 16S rRNA gene sequencing. Shotgun metagenomics is becoming more accessible and popular in microbiome research.

Metagenomics pipeline in SciLicium

In recent years, bioinformatics pipelines have become increasingly popular for metagenomics analysis. In many cases, you will be required to have bioinformatics skills, computing resources, and knowledge of existing analytical processes. Or to hire a bioinformatics consultant! Guidelines for each pipeline include configuration profiles, reference databases, and recommended analytical steps. In the absence of a formal evaluation, selecting a pipeline of parameters and algorithms can quickly become a brain teaser.

We have designed a metagenomics pipeline that can be used with both shotgun protocols and selective amplifications (such as 16S sequencing, 18S sequencing, etc.). We can obtain an in-depth understanding of microbiomes from a variety of sources, including humans (gut, skin, etc.), agriculture, and the environment (land, water, organic waste, etc.). A scientific classification to work at higher levels (prokaryotes, eukaryotes, and viruses) is integrated with a functional classification.

Our pipeline is composed of three main modules:

Pre-processing (I)

A raw sequencing dataset can be used as an initial input to SciLicium. Raw reads are preprocessed to eliminate unreliable and low quality data before analysis. Moreover, we ensure that the data are consistent: there should be no abnormally overrepresented sequences (or parts of sequences), as this might indicate contamination or problems during DNA preparation. Adapters, which are small pieces of DNA with known sequences, are also removed from the reads. If we work on human microbiota data, we will remove all human sequences.

The quality control steps ensure that the metagenomic data is free of biases and of high quality.

Taxonomic Classification (II) & Functional Analysis (III)

Following preprocessing, metagenomic sequence data are used for taxonomic classification and functional analysis. Using metagenomic data, we can profile the taxonomic composition of a sample (which species are present and in what proportion) and predict their functions. Reference databases containing genome information about known microorganisms are integrated into the pipeline in order to make these predictions.

By comparing sequence similarity with reference sequences, taxonomic classification assigns metagenomic reads to specific microbial taxa (e.g., genus, species). Using functional analysis, metagenomic reads are functionally classified (for example, metabolism or cell structure) based on their sequence similarity to known genes/proteins.

In this step, we use a variety of databases and tools, based on the type of data (16S, 18S, or shotgun metagenomic data). Additionally, we take into account the specificities of each project; for example, if the project focuses on a particular taxonomic group, we will use tools that are more sensitive to this particular taxon.

To keep in mind

The classification of metagenomic reads is a complex process that involves the alignment of metagenomic reads to reference genomes. Alignment can be computationally intensive, especially for metagenomic datasets containing millions of reads. For read alignment, a variety of algorithms can be applied, each with its own strengths and weaknesses. In order to analyze metagenomic data sets, we use state-of-the-art algorithms. Let us know if you would like to use our metagenomics pipeline for your next project!

← Prev: Bioinformatics Consulting: Everything You Need to Know [2022] Next: 2 years at SciLicium →