Tools developed & maintained by members of the lab

MetaPhlAn is a computational tool for species-level microbial profiling (bacteria, archaea, eukaryotes, and viruses) from metagenomic shotgun sequencing data. StrainPhlAn (available within MetaPhlAn) allows strain-level microbial population genomics.

StrainPhlAn is a computational tool for performing strain-level population genomics on large metagenomic datasets by profiling microbes from known species with strain level resolution and providing comparative and phylogenetic analyses of strains retrieved from metagenomic samples.

bioBakery is a suite of software, tutorials, and workflows of methods developed by the Huttenhower and Segata labs for performing analyses on microbial communities from metagenomic data.

PanPhlAn is a strain-level metagenomic profiling tool for identifying the gene composition and in-vivo transcriptional activity of individual strains in metagenomic samples. PanPhlAn's ability for strain-tracking and functional analysis of unknown pathogens makes it an efficient tool for culture-free infectious outbreak epidemiology and microbial population studies.

PhyloPhlAn 3.0 is an integrated pipeline for large-scale phylogenetic analysis of microbial isolates and genomes from metagenomes. It can assign both genomes and metagenome-assembled genomes (MAGs) to species-level genome bins (SGBs). It can both reconstruct strain-level phylogenies using clade-specific maximally informative phylogenetic markers, and scale to large phylogenies with >17,000 microbial species.

ViromeQC is a computational tool to quantify non-viral contamination in VLP-enrihed viromes. ViromeQC provides an enrichment score calculated with respect to the expected abundances of prokaryotic markers in reference metagenomes.

curatedMetagenomicData is a Bioconductor package providing uniformly processed and manually annotated human microbiome profiles for thousands of people. Microbial taxonomy (from MetaPhlAn2) and metabolic functional potential (from HUMAnN2) can be analyzed with respect to numerous participant characteristics and health outcomes, simply and reproducibly on a normal laptop.

MetaMLST is a software tool that performs an in-silico Multi Locus Sequence Typing (MLST) Analysis on metagenomic samples. MetaMLST achieves cultivation- and assembly- free strain level tracking. MetaMLST is able to detect and trace all the species to which the standard MLST protocol is applicable.

MetAML is a computational tool for metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. It provides also species-level taxonomic profiles, marker presence data, and metadata for 3000+ public available metagenomes.

GraPhlAn is a software tool for producing high-quality circular representations of taxonomic and phylogenetic trees. It focuses on concise, integrative, informative, and publication-ready representations of phylogenetically- and taxonomically-driven investigation.

MetaRef is an online resource to comprehensively catalog and characterize clade-specific microbial genes. We identify and provide all core genes associated with all microbial species and genera with available reference genomes (final or draft). A subset of these gene families are consistently present in one or more taxonomic clades, which allows us to further indicate them as marker genes

The first version of MetaPhlAn focused on specie-level profiling for bacteria and archea and was initially developed to effieicnelty analyze the large amount of shotgun metagenomics data produced by the Human Microbiome Project.

LDA Effect Size (LEfSe) is an algorithm for high-dimensional biomarker discovery and explanation that identifies genomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions (or classes). It emphasizes both statistical significance and biological relevance, allowing researchers to identify differentially abundant features that are also consistent with biologically meaningful categories (subclasses).

Other tools with contributions by members of the lab

ShortBRED is a pipeline to take a set of protein sequences, group them into families, extract a set of distinctive strings ("markers"), and then search for these markers in metagenomic data and determine the presence and abundance of the protein families of interest.

microPITA is a computational tool enabling sample selection in two-stage (tiered) metagenomic studies.

HUMAnN is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways and functioanl modules in a community from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. HUMAnN takes these reads as inputs and produces gene and pathway summaries as output.