About GutCyc

GutCyc is a publicly-available and licenced resource and portal providing pathway annotation data for environmental metagenomic samples derived from the metagenomic studies of the human gut. Advances in high-throughput sequencing are enabling researchers to explore the metabolic potential of microbial communities inhabiting the human body with unprecedented resolution. Resulting datasets are reshaping how we perceive ourselves and opening new opportunities for prevention and therapeutic intervention. Several large-scale metagenomic datasets derived from hundreds of human microbiome samples and sourced from multiple studies are now publicly available. However, the different proprietary functional annotation pipelines used to process sequence information from each of these studies, with their own choice of functional reference databases and cut-off parameters for relevant hits, introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases constructed from metagenome assemblies from 418 human microbiome samples, across three different large-scale studies, using the open-source MetaPathways pipeline that enables reproducible functional metagenomic annotation. GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn's disease, and type 2 diabetes.

A preprint detailing the GutCyc compendium can be found at :

Hahn, Altman, Konwar, et al. GutCyc: a Multi-Study Collection of Human Gut Microbiome Metabolic Models bioRxiv. (2016)

Studies

The 2015 release of Gutcyc samples consist of the following studies:

Human Microbiome Project

The Human Microbiome Project (HMP) performed whole metagenomic shotgun sequencing on 1260 samples collected from 15-18 body sites from 300 healthy human subjects. The 749 samples of the Phase 1 subset of these samples were assembled using SOAP denovo (Nature, 2012). GutCyc contains 148 of these assemblies control assemblies. For more information on the preparation, processing, and assembly of these samples click the Learn More button below.

Learn More »

MetaHit

MetaHIT: Metagenomics of the Human Intestinal Tract, is a European consortium of more than 50 researchers across 8 countries and 14 research and industrial institutions brought together between 2008 and 2012 to study the metagenomics of the human intestinal tract. One of its primary research goal was to establish associations between the genes of the human intestinal microbiota and human health and disease, focusing on two disorders of increasing importance in Europe, Inflammatory Bowel Disease (IBD) and obesity. GutCyc includes 125 metagenomic assemblies associated both both health and IBD individuals (Crohn’s Disease and Ulcerative Colitis). For more information on the preparation, processing, and assembly of these samples click the Learn More button below.

Learn More »

Beijing Genomics Institute Diabetes Study

The Beijing Genomics Institute (BGI) Diabetes Study is a metagenome-wide association study comprehensively for understanding the genetic characteristics of gut microbiota and their relationship to Type-2 diabetes (T2D). This study sequenced gut microbial DNA from 345 Chinese patients with T2D identifying approximately 60,000 associated markers that potentially be used as an early diagnostic signal or risk-factor patients with or having a predisposition for T2D. GutCyc includes annotations and contigs from 145 assembled contigs from both T2D and healthy individuals providing a potential pathway-level perspective of T2D signal. For more information on the preparation, processing, and assembly of these samples click the Learn More button below.

Learn More »

Processing

Assembled contigs from the above human-gut related studies were annotated for functional and taxonomic genes using the MetaPathways pipeline (v2.5). These annotations were formatted for input into Pathway Tools and metabolic pathway prediction via the Pathway Tools software and PathoLogic prediction algorithm.

MetaPathways

The above samples were processed by MetaPathways (BMC Bioinformatics, 2012), a modular pipeline for the analysis of environmental sequence information. These samples were processed using standard settings and included sequence quality control, open-reading frame (ORF) prediction, functional annotation with the KEGG, COG, MetaCyc, RefSeq, and SEED protein databases, Taxonomic Annotation (Silva, GreenGenes), identified tRNA genes. Metapathways integrates with Pathway Tools to generate Pathway/Genome databases from environmental samples (ePGDBs).

Learn More »

Pathway Tools

Pathway Tools is an integrated software environment for the analysis of genomic sequence information in the context MetaCyc pathways developed by SRI International in Menlo Park, California (Karp, 2002). Pathways Tools allows for the interactive query of sequences, annotated genes, and metabolites, conveniently linked with pathway visualizations, literature resources, and biochemical pathway information in a cohesive data structure known as a Pathway/Genome Database (PGDB). Pathway Tools also supports the prediction of metabolic pathways, inference of transport reactions, metabolite pathway tracing, and metabolic flux analyses.

Learn More »

Availability

Currently the extracted ePGDBs for GutCyc are available via the Data Downloads
The extracted ePGDBs pertaining to the GutCyc paper are licensed under the CC0 License.
The website is currently licensed under the CC BY-NC License.

References