Description of new sequence release format
In December 2004, we began releasing the sequence for assemblies in the form of supercontigs instead of as contigs. Contigs are now represented as features on supercontig sequences, with N's used to represent gaps between contigs. The size of the gaps are determined by linking information, which can be found in the AGP file on each assembly's download page. The minimum gap size is 100 base pairs (bp). We have also added features for clones as a complement to endread features, and four new types of assembly features - High Quality Discrepancy, Illogical Placement, Missing Partner, and Two Haplotype Inconsistency. These assembly features provide a more detailed description of the quality of our draft assemblies than previously available. They highlight regions of the assembly for which there is evidence of inconsistent placement of sequence reads. Each describes a different type of error, and higher levels of these errors decrease the level of confidence in the assembly for such regions. Their descriptions are as follows:
High Quality Discrepancy
In a single-haplotype assembly, this refers to a position on a contig where a high-quality discrepancy between two reads is observed. The extent is specified to be a base. The details for this category are "Observed Bases" which is AC, AG, AT, CG, CT, GT, CGT, AGT, ACT, ACG or ACGT.
Illogical Read Placement
This refers to a read whose partner is placed on the same contig, with inconsistent orientation or such that both reads are pointing out. The extent is specified to be a base range. The details for this category are null.
Missing Partner
This refers to a paired read placed on a contig, whose partner has room to land on the contig but is not placed there. The extent is specified to be a base range. The details for this category are "Partner Start" and "Partner Stop" two positions on the contig, describing the range of bases where the partner would be expected to fall, allowing for up to 4 standard deviations stretch of the link.
Two Haplotype Inconsistency
In a two-haplotype assembly, this refers to a position on a contig where a high-quality discrepancy is observed, and which in combination with other such discrepancies, implies an inconsistency with the presence of only two haplotypes. The extent is specified to be a base. The details for this category are "Observed B ases" which is AC, AG, AT, CG, CT, GT, CGT, AGT, ACT, ACG or ACGT.
If you have any questions about this change or these new features, please feel free to contact us.
