News & Announcements
July 1, 2011
"Argonne's MG-RAST metagenome analysis server passes 1-terabase milestone "
MG-RAST, the metagenome analysis server developed and operated by researchers at Argonne National Laboratory and the University of Chicago, has reached a new milestone: 1 terabase (1,012 basepairs) of metagenomic DNA analyzed by the server. Metagenomics refers to the sequencing of DNA from many organisms present in environmental samples. A rapidly growing field of environmental biology, metagenomics allows researchers to gain insights into the roles played by the vast majority of microorganisms in those environments that cannot be grown in laboratory settings.
MG-RAST has been used extensively for various applications, including automated analyses of phylogenetic context, identification of genes and protein families, and subsystem and metabolic reconstructions. Over the past four years, the amount of data submitted to the server has increased dramatically (see Figure 1), and today MG-RAST is the world’s leading open metagenomics analysis platform.
The 1-terabase milestone was made possible by innovations in computational scaling and improved analysis standard operating procedures. "The result is a 200-fold increase in processing speed and capacity relative to the fastest previous known such procedure," said Folker Meyer, a computational biologist at Argonne who leads MG-RAST’s multidisciplinary development team.
To support such computations, the team built a software infrastructure that enables users to conduct high-efficiency analysis across geographically distributed resources. “This infrastructure also enables us to make effective use of DOE's Magellan cloud testbed systems located at both Argonne and the National Energy Research Scientific Computing Center,” said Narayan Desai, systems architect for MG-RAST. ”This is our first step to generating the world’s most comprehensive metagenomic analysis pipeline on the cloud."
MG-RAST provides an easy-to-use resource allowing researchers free access to data processing and analysis. The results are available as precomputed analyses for comparison of the sequences against all leading sequence databases.
Andreas Wilke, the lead software engineer for MG-RAST, stated: "The integration of many sequence databases into one single, searchable database has dramatically increased the usefulness of MG-RAST to many of our users who are used to a specific source of annotations. Users now can keep using the annotations they are familiar with for the new sequences annotated inside MG-RAST.”
Working with the international Genomics Standards Consortium (GSC), the MG-RAST team also has created an interface that allows users to provide contextual information for their sequence data, capturing information such as sample origin, physical, chemical and biology characteristics of the sample, and the technology used to extract nucleic acids, process the material and sequence it. “As the number of metagenomic data sets expanded from a few hundreds to tens of thousands, the community needed digital formats for reporting sample metadata,” said Meyer.
"Having hundreds of publicly available samples with extensive metadata in MG-RAST enables vital ecological and statistical analysis regarding the distribution and response of microbial life to environmental gradients across the world,” said Jack Gilbert, a microbial ecologist at Argonne.
The MG-RAST server has been developed and operated by support from the Alfred P. Sloan Foundation, the U.S. Department of Energy, and the National Institutes of Health. For further information about MG-RAST, see: http://metagenomics.anl.gov