Magnaporthe grisea FAQ

Questions

Answers

  • Sequencing
    1. What is whole-genome shotgun sequencing?
      Whole genome shotgun sequencing is a technique for determining the DNA sequence of a genome by randomly shearing the DNA, sequencing multiple overlapping fragments, and inferring the original sequence from fragments that overlap. This method has been successfully used for bacterial genomes or subclones, like cosmids. See Assembly for details.
    2. What is an assembly?
      An assembly is a representation of the computationally derived relative positions of a set of sequenced fragments. When these individual sequences overlap, a consensus sequence is derived representing the most likely base at each position in the assembly. In this way, increased sequence redundancy improves the quality of the assembly and the confidence in the consensus. See Assembly for details.
    3. What does the name "Contig 2.XXX" mean?
      A contig is a sequence fragment created by assembling whole-genome shotgun reads. See Assembly for details.

      Every assembly contains multiple contigs. Each assembly is numbered sequentially. The number preceding the decimal point indicates the assembly number. Contigs within an assembly are also numbered sequentially. Thus "Contig 2.177" indicates contig #177 within assembly 2. Contig numbers are not conserved between assemblies, and so "Contig 2.177" bears no relationship to "Contig 1.177".

    4. What is a sequence contig?
      A sequence contig is the extended contiguous sequence that is produced by the assembly process that joins overlapping sequences. See Assembly for details.
    5. Are the contigs ordered?
      Contigs within the same supercontig are ordered. See Assembly for details.
    6. What is a sequence supercontig?
      A supercontig consists of one or more sequence contigs known to occur in a specific order and orientation. Because we sequence each end of the subclones of plasmids, Fosmids, and BACs, we can recognize that when one end of a clone lies in one sequence contig and the other end of the clone lies in a different sequence contig, these two contigs probably lie close to each other. To create supercontigs we require that two or more such linking clones join two sequence contigs. See Assembly for details.
    7. Are the supercontigs ordered?
      No, the supercontigs are not ordered by number. However, we have aligned most of the genome against genetic maps which help position supercontigs on chromosomes. See Genetic Maps for details.
    8. Are supercontig (contig) numbers preserved between different assemblies?
      No. Supercontig 1.5 (supercontig 5 in assembly 1) bears no relation to supercontig 2.5 (supercontig 5 in assembly 2). Similarly for contig numbers.
    9. How big is the Magnaporthe genome?
      Our current total unique contig length of 38 Mb base pairs (bp) suggests the genome is approximately 40 Mb.
    10. What strain was sequenced?
      The rice-infecting strain 70-15 has been chosen as the seminal isolate for genome sequencing. This isolate was developed by numerous back crosses to the wild isolate Guy 11 (Leung et al. 1988; Chao and Ellingboe 1991). Strain 70-15 (Mat1-1) is pathogenic on rice and is fully fertile (acts as both male and female). Strains and clones can be ordered from the FGSC.

    11. What is the current state of the assembly?
      The current assembly contains over 2000 sequence contigs >2 kb. There are no current plans for additional sequencing or finishing. See Assembly for detail.
    12. How complete is the current assembly?
      We estimate that the current release represents 97% of the Magnaporthe genome and is covered to a depth of > 7X. It excludes very highly conserved repetitive sequence, and ribosomal RNA genes.
    13. Are the contigs ordered? For example, is contig 1.5 flanked by contigs 1.4 and 1.6?
      The contigs are numbered sequentially within larger supercontig fragments. Contigs within the same supercontig are positionally ordered. See Magnaporthe Contig Numbering for details.

    14. What contig in assembly 2 corresponds to my contig 1.XXX from assembly 1?
      Unfortunately there is no automatic way of correlating contig numbers across different assemblies. You can always BLAST your region of interest against the new assembly to get the contig numbers within the latest assembly.

    15. How has the sequence been generated for the Magnaporthe project?
      Our data consist of over 1 million individual sequencing reads obtained by sequencing each end of plasmids and Fosmids from libraries containing randomly sheared fragments of 4 kb and 40 kb average size respectively. See Assembly for details.
    16. Will the genome be finished?
      Unfortunately there are no plans to finish the genome.
    17. How will we know the assembly is correct?
      The quality of the assembly will be assessed in several ways. In addition to requiring that the paired plasmid and Fosmid ends occur in a logical manner, our assembly of the Magnaporthe genome will be verified through: 1) integration of BAC end sequences, 2) comparison with available genomic sequences, and 3) correlation with the genetic map.
    18. What data are available?
      In this version of our data release, all sequence contigs over 2 kb are available. Smaller contigs are sparsely covered and often include poor quality or contaminated DNA. Sequence contig data can be accessed in several ways: either through a BLASTN or TBLASTN search with an option for contig subsequence retrieval, or through FTP download of the entire genome. Contig sequences are subject to change throughout this project, so each data release version number will be appended to the contig number as a prefix (e.g. 2.235 denotes assembly version 2, contig #235).

      We also provide precomputed BLAST results against NT and NR. These sequence similarity results can be searched (based on name, GI, species name, etc) and viewed graphically along with the underlying Magnaporthe sequence.

      BAC and Fosmid clones have been integrated into the current assembly, and you can search and view the locations of these clones within the sequence contigs.

      The current assembly has been correlated with the genetic map and over 93% of the assembly is anchored to a linkage group. You can view the physical and genetic maps by using the "Genetic Map" link above. You may also search for particular genetic markers which have been located in the current assembly, using the "Features" search link.

      We have annotated the current sequence with putative genes, based on gene prediction tools and similarity to known genes. These genes are available for download, search by name/locus, and BLASTX and BLASTP searches.

    19. Are the clones being sequenced available to Magnaporthe investigators?
      The BAC clones are available from the FGSC. You can find clones that overlap a region of interest by using the Region search link.

      We do not have the resources necessary to make available the 4 kb plasmid and 40 kb Fosmid clones.

    20. What about Fosmid end sequences?
      As part of this project, we sequenced Fosmid clones and the BAC ends used in this assembly were sequenced by Ralph Dean and his team at the Fungal Genomics Laboratory of North Carolina State University, funded by Novartis Foundation. These sequences will be crucial for ordering and orienting the genome as well as providing templates for gaps that are not captured by plasmids. The BAC clones are currently available through the Fungal Genetics Stock Center.
    21. There are no clones covering my region of interest. How can this be?
      The contigs are created with sequence reads from small insert plasmids (around 4000bp) along with larger insert Fosmids and BACs. If your region of interest is made up only of DNA sequenced from plasmid and Fosmid clones then there may be no BACs containing this sequence region. Unfortunately we cannot provide clones to order for these regions.

    22. What are the plans for annotation?
      We have provided an automated annotation. See What's New.
    23. Can I access old versions of the assembly?
      Previous versions of the assembly are available for download from the Download page. These older versions are not accessible for BLAST, feature search, or display in the FeatureMap/GenomeBrowser.
  • Chromosomes, Genes, and Regions of Interest
    1. Which chromosome does supercontig XXX reside on?
      Some of our supercontigs have been anchored to one of the seven linkage groups. Use the Magnaporthe Supercontig Table to see if your contig or supercontig has been assigned to a linkage group. You can also use the Region search to look up Linkage Group information for a particular contig.
    2. Has my favorite gene/region XXX been sequenced?
      The whole genome has been shotgun sequenced to greater than 7X depth (see How complete is the current assembly) and therefore we expect 97% of the genome to be represented in our assembly.

      You can use the Features search to search for BLASTN or BLASTX alignments containing the name of your gene. You can also search for all BLAST alignments to a particular species of interest.

      We have annotated the sequence with automatic gene prediction, and you can search for a particular gene using the "Features" link. You can also use the Linkage Group Genetic Maps to look for regions containing markers of interest.

    3. What additional information do you have on my favorite gene/region XXX?
      Use the Region search to view a particular region in one of two graphical viewers. These viewers will display all the genetic markers, blast alignments, and clone ends within the region of interest.

      You can also look for your region in the correlation between the genetic and physical maps.

    4. How do I order clones that contain my gene/region of interest?
      You can find the BAC clones overlapping a particular region of interest by using the Regions link above.

      Type in the contig name, and start/stop position if available, and then click the Clones button.

      This will return a list of clones overlapping the region of interest. There's a link from this search result page to allow you to order clones from the FGSC.

    5. How can I see the features neighboring my gene of interest?
      Using the FeatureMap or GenomeBrowser you can visually see the features in a region neighboring your gene of interest:
      • You can bring up any region of a contig using the Region search.
      • You can also expand any currently viewed region by modifying the Start and Stop coordinates below the display panel and clicking "Redraw".

      You can also search for features in your region of interest by using the Advanced Search link from the Features link above. The Advanced Search lets you narrow your search by entering start and stop positions on a contig.

  • Annotated Genes
    1. Is gene XXX annotated in the sequence?
      Maybe. We have run automated tools for finding putative genes, relying on ab initio gene finders and sequence similarity to known proteins.

      You can search for a gene by name, or by a blastx hit to a known gene. However the gene names are extremely preliminary, and you will find most genes are either named 'predicted protein' (meaning no or weak homology to known genes), 'hypothetical protein' (indicating weak homology to known genes), or 'hypothetical protein (name)' (indicating strong homology).

    2. How were the genes annotated?
      See Automated Gene Calling.

    3. There seem to be two different ways of seeing my gene in the FeatureMap - what's going on here?
      You are right. There are two different ways of looking at a gene in the FeatureMap or GenomeBrowser.
      1. Gene within a contig (e.g. title "Contig 2.77")

        If you bring up the FeatureMap or GenomeBrowser on a region of a contig, then you are seeing the result of DNA-based analyses. You'll know you are in this mode if the title of the FeatureMap gives a contig number.

        This graphical view shows the results of analyses performed on the nucleotide sequence. For example:

        1. De novo gene prediction programs: Fgenesh, Genscan
        2. Blastn searches against NT
        3. Blastn searches against ESTs
        4. Blastx searches of the translated nucleotide sequences against proteins in NR
        5. HMMER searches of the translated DNA against PFAM

        You can get to these FeatureMaps by using the Search Regions page.

      2. A single gene by itself (e.g. title "MG#####")

        You can also bring up the FeatureMap or GenomeBrowser on a protein sequence corresponding to a particular gene. In this view you will see the results of protein-based analyses on the amino-acid sequence. For example:

        1. Blastp searches of the protein against proteins in NR
        2. HMMER searches of the protein against PFAM

        You can get the FeatureMap of a particular gene from the Feature Detail page corresponding to that locus

  • Downloading
    1. What format is the download file in?
      The genome data is pure text in multiple FASTA format. The text file has been compressed using gzip. To uncompress the file:
    2.  	    gunzip magnaporthe_1.fasta.gz 	    
    3. Why does gunzip tell me the file is not in gzip format?
      Some browsers (like newer versions on Netscape) automatically unzip files after download. If this is the case, the file should be 39 Mb (rather than 11 Mb of the compressed file). You can just rename the file to remove the .gz suffix.
    4. The download fails. What should I do?
      Downloading through the browser uses the http protocol. You can also try accessing the ftp site directly via the URL:

  • BLASTing
    1. Why is my BLAST job taking so long?
      BLAST jobs are queued and handled with other internal Broad processes in a general Load Sharing Facility. The delay for receiving your BLAST results depends on the current load.
    2. Why are my BLAST results split into multiple email messages?
      Some email programs are configured with a maximum message size and will automatically split large files into smaller pieces. If this is undesirable, you will need to reconfigure your email program.

    3. What sequences can I BLAST against?
      You can BLAST your query sequence against our entire assembly or special sequences set excluded from the assembly.

    4. Why do I get the message "ERROR: BLASTSetUpSearch: Unable to calculate Karlin-Altschul params, check query sequence"?
      From the NCBI Blast FAQ:
      This will happen if your entire query sequence has been masked by low complexity filtering. You will need to turn filtering off to get hits. For further information on filtering, please read the sections of the BLAST FAQs on Q: What is low-complexity sequence? and also Q: After running a search why do I see a string of "X"s (or "N"s) in my query sequence that I did not put there?

    5. After running a search why do I see a string of "X"s (or "N"s) in my query sequence that I did not put there?
      From the NCBI Blast FAQ:
      You are seeing the result of automatic filtering of your query for low-complexity sequence that is performed to prevent artifactual hits. The filter substitutes any low-complexity sequence that it finds with the letter "N" in nucleotide sequence (e.g., "NNNNNNNNNNNNN") or the letter "X" in protein sequences (e.g., "XXXXXXXXX"). Low-complexity regions can result in high scores that reflect compositional bias rather than significant position-by-position alignment (Wootton & Federhen, 1996). Filter programs can eliminate these potentially confounding matches from the blast reports, leaving regions whose BLAST statistics reflect the specificity of their parities alignment. Queries searched with the blastn program are filtered with DUST. The other BLAST programs use SEG.

    6. What is low-complexity sequence?
      From the NCBI Blast FAQ:
      Regions with low-complexity sequence have an unusual composition and this can create problems in sequence similarity searching (Wootton & Federhen, 1996). Low-complexity sequence can often be recognized by visual inspection. For example, the protein sequence PPCDPPPPPKDKKKKDDGPP has low complexity and so does the nucleotide sequence AAATAAAAAAAATAAAAAAT. Filters are used to remove low-complexity sequence because it can cause artifactual hits (please also see Q: After running a search why do I see a string of "X"s (or "N"s) in my query sequence that I did not put there?)

      In BLAST searches performed without a filter, often certain hits will be reported with high scores only because of the presence of a low-complexity region. Most often, this type of match cannot be thought of as the result of homology shared by the sequences. Rather, it is as if the low-complexity region is "sticky" and is pulling out many sequences that are not truly related.

  • Genome Browser
    1. Does the Genome Browser Java applet run on Macintosh computers?
      We are pleased to announce that the Genome Browser can now run on both Windows and Macintosh platforms.

      Requirements for Windows:
      Windows 9x & NT platforms or better
      Java 1.4
      Netscape Navigator 4+, Internet Explorer 5+, Mozilla 1.* or other browser that can display Java applets

      Requirements for Macintosh:
      OS X
      Java 1.4 (Software Update)
      Safari

  • Misc
    1. What's NCSU?
      North Carolina State University, http://www.ncsu.edu

    2. What's the Broad Institute?
      The Eli and Edythe L. Broad Institute is a partnership among MIT, Harvard and affiliated hospitals and the Whitehead Institute for Biomedical Research. Its mission is to create the tools for genomic medicine and make them freely available to the world and to pioneer their application to the study and treatment of disease.

    3. How do I cite the sequence for publication?
      Publications should include the following citation:
      Magnaporthe Sequencing Project. Ralph Dean, Fungal Genomics Laboratory at North Carolina State University (http://www.fungalgenomics.ncsu.edu), and Broad Institute of MIT and Harvard (http://www.broad.mit.edu)

    4. Who do I contact with questions about the sequencing?
      For additional help or to send feedback on this website, please contact riceblast-info@ncsu.edu, annotation-webmaster@broad.mit.edu.
    5. Where are the beautiful photos from?
      The top, middle and bottom images on the front page are micrographs used with permission, from the Annual Review of Microbiology, Volume 50 ©1996 by Annual Reviews http://www.AnnualReviews.org.

      The other two images are provided by the Agricultural Research Service of the USDA.