 |
BLAST Help: main search parameters
- PROGRAM
-
You can use BLAST to search for similarity in either nucleotide or
protein sequences.
- blastn: nucleotide to nucleotide search
Search your DNA sequence against a nucleotide sequence database
- tblastn: protein to nucleotide search
Search your amino-acid sequence against a nucleotide sequence
database. The query sequence is compared to the nucleotide
sequence database in all six translation frames.
- blastx: nucleotide to protein search
Search your DNA sequence against a protein sequence database
(only available for our genomes annotated with predicted genes)
- blastp: protein to protein search
Search your amino-acid sequence against a protein sequence database
(only available for our genomes annotated with predicted genes)
- EXPECT (E value)
-
The statistical significance threshold for reporting
matches against database sequences; the default value
is 10, such that 10 matches are expected to be found
merely by chance, according to the stochastic model
of Karlin and Altschul (1990). If the statistical
significance ascribed to a match is greater than the
EXPECT threshold, the match will not be reported.
Lower EXPECT thresholds are more stringent, leading
to fewer chance matches being reported. Fractional
values are acceptable. (See parameter E in the BLAST
Manual).
- CUTOFF
-
Cutoff score for reporting high-scoring segment pairs.
The default value is calculated from the EXPECT value
(see above). HSPs are reported for a database sequence
only if the statistical significance ascribed to them
is at least as high as would be ascribed to a lone
HSP having a score equal to the CUTOFF value. Higher
CUTOFF values are more stringent, leading to fewer
chance matches being reported. (See parameter S in
the BLAST Manual). Typically, significance thresholds
can be more intuitively managed using EXPECT.
- MATRIX
-
Specify an alternate scoring matrix for the translation of nucleotides
to proteins in the TBLASTN search. The default matrix is BLOSUM62
(Henikoff & Henikoff, 1992). The matrix parameter is ignored for
BLASTN nucleotide searches.
- FILTER
-
Mask off segments of the query sequence that have
low compositional complexity, as determined by the
SEG program of Wootton & Federhen (Computers and
Chemistry, 1993), or segments consisting of
short-periodicity internal repeats, as determined
by the XNU program of Claverie & States (Computers
and Chemistry, 1993), or, for BLASTN, by the DUST
program of Tatusov and Lipman (in preparation).
Filtering can eliminate statistically significant but
biologically uninteresting reports from the blast
output (e.g., hits against common acidic-, basic- or
proline-rich regions), leaving the more biologically
interesting regions of the query sequence available
for specific matching against database sequences.
Low complexity sequence found by a filter program is
substituted using the letter "N" in nucleotide sequence
(e.g., "NNNNNNNNNNNNN") and the letter "X" in protein
sequences (e.g., "XXXXXXXXX"). Users may turn off
filtering by using the "Filter" option on the "Advanced
options for the BLAST server" page.
Filtering is only applied to the query sequence (or
its translation products), not to database sequences.
Default filtering is DUST for BLASTN, SEG for other
programs.
It is not unusual for nothing at all to be masked
by SEG, XNU, or both, when applied to sequences
in SWISS-PROT, so filtering should not be expected to
always yield an effect. Furthermore, in some cases,
sequences are masked in their entirety, indicating that
the statistical significance of any matches reported
against the unfiltered query sequence should be suspect.
- ALIGNMENTS FORMAT
- Gapped
alignment allows for gaps in the regions of sequence similarity.
- DESCRIPTIONS
- Restricts the number of short descriptions of matching
sequences reported to the number specified; default
limit is 100 descriptions. (See parameter V in the
manual page). See also EXPECT and CUTOFF.
- ALIGNMENTS
- Restricts database sequences to the number specified for
which high-scoring segment pairs (HSPs) are reported;
the default limit is 50. If more database sequences
than this happen to satisfy the statistical
significance threshold for reporting (see EXPECT and
CUTOFF below), only the matches ascribed the greatest
statistical significance are reported.
(See parameter B in the BLAST Manual).
For further informations about BLAST refer to the documentation at
NCBI.
|