Methanosarcina Project Information

Methanosarcina acetivorans Methanogenesis, the biological production of methane, plays a pivotal role in the global carbon cycle and contributes significantly to global warming. Each year, an estimated 900 million metric tons of methane are biologically produced, the majority of which is derived from acetate. We present here the first fully-annotated genome sequence of an acetate-utilizing methanogen, Methanosarcina acetivorans C2A.

The Methanosarcineae are metabolically and physiologically the most versatile methanogens. Only Methanosarcina species possess all three known pathways for methanogenesis and are capable of utilizing no less than nine methanogenic substrates, including acetate. In contrast, all other orders of methanogens possess a single pathway for methanogenesis, and many utilize no more than two substrates.

Among methanogens, the Methanosarcineae display extensive environmental diversity. Individual species of Methanosarcina have been found in freshwater and marine sediments, decaying leaves and garden soils, oil wells, sewage and animal waste digestors and lagoons, thermophilic digestors, faeces of herbivorous animals, and the rumens of ungulates.

The Methanosarcineae are unique among the Archaea in forming complex multicellular structures during different phases of growth and in response to environmental change. Within the Methanosarcineae, a number of distinct morphological forms have been characterized including single cells with and without a cell envelope, as well as multicellular packets (Figure left) and lamina.

This metabolic and physiological versatility is reflected in genome of Methanosarcina acetivorans. At 5.71Mb it is by far the largest known archael genome and larger than many sequenced bacteria. An analysis of the 4,524 open reading frames reveals a strikingly wide and unanticipated variety of metabolic and cellular capabilities. The results of these analyses are presented in Galagan et al (2002), Genome Research 12(4):532-542.

Available Genome Data

We are happy to freely provide the entire Methanosarcina acetivorans fully-annotated genome:

Project Description

The Methanosarcina acetivorans sequencing project reflects a close collaboration with Dr. William Metcalf from the University of Illinois. Principal investigators from the Broad Institute include James Galagan, Bruce Birren and Chad Nusbaum.

Sequencing and Assembly
The genome was sequenced by the whole genome shotgun method.

The Methanosarcina acetivorans strain C2A was grown in single cell morphology47 at 35°C in HS broth medium containing 125 mM methanol plus 40 mM sodium acetate (HS-MA medium)48.

Genomic DNA was isolated from M. acetivorans and was used to construct m13 (1.5kb inserts), plasmid (4kb inserts), and fosmid (40kb inserts) libraries. Plasmid and Fosmid inserts were sequenced from both ends to generate paired-reads. We generated sequence coverage of 7X from plasmids, 1X from M13 and 0.076X from Fosmids and assembled it with Phrap.Initial analysis of the assembly was done with the Mapper software (M.C. Zody, personal communication) to select gap-spanning clones for finishing. 200 gaps spanned by plasmid clones were closed by transposon-based sequencing using the EZ::TN <KAN-2> (tm) from Epicentre. 48 gaps spanned only by Fosmids were closed by sequencing Fosmid-derived PCR products. Sequence from 28 unspanned gaps was obtained from fragments generated by combinatorial PCR using genomic DNA as template and pooled primers50. One unspanned gap was closed by sequencing a small-insert library51 produced from an 8.5 kb PCR product . Regions of low sequence quality were resolved by:

  • use of ABI dGTP Big Dye Terminator sequencing mix
  • transposon-primed sequencing of plasmid clones, or
  • sequencing PCR products obtained from plasmid or genomic template
Paired-reads within the assembly were visualized with the Mapper software and used in assembly validation. Regions of the assembly spanned by paired-reads occurring with the appropriate orientation and spacing were considered valid. 99.99% of the genome was validated in this way while only 6270 bases of the finished assembly were spanned soley by sequenced PCR products. Regions of the assembly containing nonsensical paired-reads were analyzed further. Eighteen of these regions proved to have been misassembled by Phrap, and were resolved manually. The finished genome sequence was manually inspected for quality and edited using the Staden package viewer Gap4. During annotation 480 possible sequence errors (based on breaks in open reading frames) were identified. These were manually reviewed and one was shown to be an editing error and corrected. Five of the possible errors were not unambiguous by quality, but after re-sequencing were shown to have been correct.

Automated Annotation
The M. acetivorans genome was annotated using the Calhoun annotation system.

  1. Open reading frames (ORFs) likely to encode proteins were predicted using GLIMMER2.
  2. All ORFs were searched against two sets of protein family Hidden Markov Models (HMM), Pfam and TIGRFAM, using the HMMER program.
  3. The entire genome was searched against the public protein databases using BLASTX with threshold E < 1e-5, and againt the public nucleotide databases using BLASTN with the threshold E < 1e-9
  4. Transfer RNAs were identified using the tRNAScan-SE program.
  5. ORFs longer than 200bp and all ORFS with similarity to a protein family HMM or known proteins were annotated as genes

Manual Annotation and Naming Conventions
Genes were assigned identities by a team of seven annotators and three annotation reviewers.

  • All ORFs were inspected for alternative start positions
  • ORFs with no similarity to other sequence were named predicted proteins
  • ORFs with similarity to sequences with unknown function were named conserved hypothetical proteins
  • For ORFs with similarity to sequences of known function, we:
    1. Inspected all corresponding BLAST alignments in order to track biological evidence supporting function
    2. Consulted literature to identify those proteins experimentally characterized
    3. Reviewed correspondence to protein families
    4. Determined standard Enzyme Commission designation, if possible
    5. Named gene in accordance with Enzyme Commission designation
    6. Categorized gene by cellular function designation
    7. Marked unusual genes for further review
To supplement and verify the Calhoun manual annotation process, the genome sequence was applied to the annotation pipeline associated with the TIGR Comprehensive Microbial Resource. Results between both systems displayed no substantial inconsistencies.

Multigene families were constructed by searching each annotated gene against all other genes using BLASTP, requiring matches with E < 1e-5 over 60% of the longer gene length, and subsequently clustering genes with matches.

The genome was renumbered with the start at the putative origin of replication, which was identified as the point of maximum cumulative AT skew (defined as the cumulative sum of A-T/A+T on one strand).

Community Annotation
A panel of over two dozen experts was assembled to analyze the genome as part of a Community Annotation Project (CAP). The scientists in this community project could view and submit genome annotations using this website, and they drew together expert analyses of biological pathways. The project culminated in a two day Genome Analysis Meeting at the Broad Institute. CAP participants included:

    Bill Metcalf
    Robert Barber
    Isaac Caan
    Everly Conway de Macario
    James Ferry
    David Graham
    David Grahame
    Reiner Hedderich
    Cheryl Ingram-Smith
    Ken Jarrell
    Holly Jing
    Joseph Krzycki
    John Leigh
    Alberto Macario
    Biswarup Mukhopadhyay
    Gary Olsen
    Ian Paulsen
    Matt Pritchet
    John Reeve
    Kerry Smith
    Kevin Sowers
    Tim Springer
    Ron Swanson
    Robert Tabita
    Rolf Thaur
    Robert White
    Owen White
    William B Whitman
    Stephen Zinder

See Community Annotation Project for more details.

Some conference participants

Methanosarcina cells
Discussion
Excitement
Bruce giving a tour


What's next?

There are exciting times ahead for the Methanosarcina community as four different species are being fully sequenced and microarray projects are underway. Stay tuned for more exciting results...

Questions about the project should be directed to annotation-info@broad.mit.edu.