Frequently Asked Questions

Questions

Sequencing

Annotated Genes

Misc

Answers

Sequencing

  1. How big is the Sclerotinia sclerotiorum genome?

    Our current total unique contig length is 38,001,451 bp (base pairs). The estimated genome size of S. sclerotiorum is 38 Mb.

  2. What strain was sequenced?

    Sclerotinia sclerotiorum strain 1980.

  3. What is the current state of the assembly?

    The current assembly contains 679 sequence contigs in 36 supercontigs (scaffolds). There are no current plans for additional sequencing or finishing. See Assembly for detail.

  4. How has the sequence been generated for the Sclerotinia sclerotiorum project?

    Our data consist of over 470,000 individual sequencing reads obtained by sequencing each end of plasmids and Fosmids from libraries containing randomly sheared fragments of 4 kb, 10 kb, and 40 kb average size respectively. See Assembly for details.

  5. Will the genome be finished?

    Unfortunately there are no plans to finish this genome at this time.

  6. How will we know the assembly is correct?

    The quality of the assembly will be assessed in several ways. In addition to requiring that the paired plasmid and Fosmid ends occur in a logical manner, our assembly of the Sclerotinia sclerotiorum genome will be verified through comparison with available genomic sequences.

  7. What data are available?

    In this version of our data release, all sequence contigs and supercontigs are available. Sequence data can be accessed in several ways: either through a searching with BLASTN or TBLASTN, retrieving of a specific region of the assembly, or by downloading the entire genome. Supercontig and contig sequences are subject to change throughout this project, so each data release version number will be appended to the contig number as a prefix (e.g. 1.1 denotes assembly version 1, supercontig #1).

    Fosmid clones have been integrated into the current assembly, and you can search and view the locations of these clones within the sequence supercontigs.

    A fasta file of raw reads excluded from the assembly is also available for BLAST and download. Also available for download are an AGP file describing the supercontigs and contigs in this assembly and a file listing coordinates of paired endreads.

  8. This sequence release looks different from previous releases, like Neurospora crassa. What's different?

    Important information about this release can be found here.

Annotated Genes

  1. Is gene XXX annotated in the sequence?

    Maybe. We have run automated tools for finding putative genes, relying on ab initio gene finders and sequence similarity to known proteins.

    You can search for a gene by name, or by a blastx hit to a known gene. However the gene names are extremely preliminary, and you will find most genes are either named 'predicted protein' (meaning no or weak homology to known genes), 'hypothetical protein' (indicating weak homology to known genes), or 'hypothetical protein (name)' (indicating strong homology).

    We are not yet in a position to curate manual annotations.

  2. Gene XXX is annotated incorrectly in your sequence - can I submit an update to your gene name?

    Unfortunately, we are not yet in a position to curate manual annotations. We are currently still discussing future annotation plans.

  3. How were the genes annotated?

    See Automated Gene Calling.

  4. There seem to be two different ways of seeing my gene in the FeatureMap - what's going on here?

    There are two different ways of looking at a gene in the FeatureMap or GenomeBrowser.

    1. Gene within a supercontig (e.g. title "supercontig 1.1")

      If you bring up the FeatureMap or GenomeBrowser on a region of a supercontig, then you are seeing the result of DNA-based analyses. You'll know you are in this mode if the title of the FeatureMap gives a contig number.

      This graphical view shows the results of analyses performed on the nucleotide sequence. For example:

      1. De novo gene prediction programs: Fgenesh, Genscan
      2. Blastn searches against NT
      3. Blastn searches against ESTs
      4. Blastx searches of the translated nucleotide sequences against proteins in NR
      5. HMMER searches of the translated DNA against PFAM

      You can get to these FeatureMaps by using the Search Regions page.

    2. A single gene by itself (e.g. title "SS1#####")

      You can also bring up the FeatureMap or GenomeBrowser on a protein sequence corresponding to a particular gene. In this view you will see the results of protein-based analyses on the amino-acid sequence. For example:

      1. Blastp searches of the protein against proteins in NR
      2. HMMER searches of the protein against PFAM

      You can get the FeatureMap of a particular gene from the Feature Detail page corresponding to that locus

    However the HMMER searches found at the DNA level can be misleading, since they do not take the exon structure of the gene into account.

    In addition to the HMMER searches of the DNA, we also perform HMMER searches against our predicted gene set. These HMMER protein searches are likely to be more accurate, thus we present the protein based PFAM results in the "Feature Detail" summary. We also used the protein-based PFAM results when searching for genes by PFAM domain, in the Advanced Search for Annotated Genes and the gene index of Genes by PFAM

    The Feature Search mechanism provides access to the results of the DNA analyses, thus the HMMER Feature Search will show the results of the DNA-based HMMER program.

    DNA-based HMMER results:

    Protein-based HMMER results:

Misc

  1. How do I cite the sequence for publication?

    Publications should include the following citation:
    Sclerotinia sclerotiorum Sequencing Project. Broad Institute of Harvard and MIT (http://www.broad.mit.edu)

  2. Who do I contact with questions about the sequencing?

    Our email address is annotation-webmaster@broad.mit.edu.