is a web based graphical tool to explore gene structure, including
alternative splicing, based on a mapping on the EST consensus
sequences (contigs) from
to the complete human genome. SpliceNest is integrated with
GeneNest and the
protein sequence cluster set in one framework, permitting an
overall exploration of the whole sequence space covering protein,
mRNA and EST sequences, as well as genomic DNA
How to find the alignments
An alignment between an EST cluster and a genomic sequence
can be found in several different ways:
- Browsing: From the home page, click
on a chromosome to see a list of matches sorted by cluster
number or by genomic location. From the chromosome menu page
you can choose lists of all matches,
- Graphical chromosome display:
This is also available for each chromosome as links from the
home page. A vertical axis in the left frame represents the
chromosome, with coordinates from the
Golden Path chromosome sequences.
Matches are displayed as bars to the right of the axis.
Different colours are used for
bad matches (blue),
good matches (red), and
alt. splice candidates (orange).
Click on a match to display it. Click on the axis to zoom
in. When the direction of a gene is known (inferred from
splice signals), the match bar is equipped with an arrowhead
(but only if there is room for it; one may have to zoom in to
- Cluster search: If you are interested in a specific
UniGene cluster, just type in the cluster name (e.g. "Hs635")
in the search box on the home page, the chromosome menus, or
the alignment displays. If there is more than one match,
a list of matches sorted by
- Keyword or Blast search: You can use the GeneNest
page to search the clusters by EST or gene accession number, clone id,
clone library, or sequence annotation. You can also do a Blast
search with a DNA or protein sequence.
The alignment display
The graphics shows the alignment with the genomic sequence
and the exon/intron structure for each contig. Possible
alternative splice sites are highlighted with yellow bands.
Moving the mouse over most items displays more
details; clicking links to further information, such as
GeneNest assemblies, detailed alignments, or EMBL
A panel below the graphics permits to zoom into the alignment and to
switch on and off certain features.
Methods and presentation details
The EST contigs are from the
of the Mar 2001 version of the NCBI
clustering of human genes.
The chromosomes are the Apr 1, 2001 freeze of the
HUGO Golden Path assembly
of the complete human genome.
Mapping and alignment
The matching pairs of EST contigs and chromosome fragments were
found by searching all matches of length 100 with at most 3 errors
(mismatches or indels) of all contigs of all clusters against the complete genome.
This was done using the fast search program/algorithm vmatch by
The algorithm exploits a modified suffix tree data structure.
Before searching, repeat elements were filtered out using the program
by A.F.A. Smit and P. Green.
For each mathing cluster a refined search against the matching
chromosome was made, in order to approximately determine a gene region
containing all exons. In the refined search, all matches of length
30 with at most 2 errors were found.
Finally, for each cluster a spliced alignment of all contigs
against the matching chromosome region(s) was determined using
by Florea et al.
The exon positions, percentage identity and splice signals
from this aligment are shown in the graphics.
The main criterion for including a match is that it contains a 100
bp region with at most 3 errors. Most such matches are aligned,
but the following exceptional cases are skipped:
- Very short matches: matches whose total length (all exons) is less than 200 bp.
- Repeated matches: when a cluster has more than 20
mathces in a single chromosome, only the 20 best (according to
the E-value from vmatch) are
Automatic analysis and classification
The alignments are automatically parsed and analysed to help
classifying matches and find candidates for alternative splicing.
In an alignment, a contig is classified as good (or OK) if
the following criteria are satisfied.
In the graphics, good contigs are labeled with black numbers
and bad contigs with grey numbers. By checking "Skip bad contigs",
only good contigs are displayed.
- the average alignment identity is at least 95%.
- it spans at least one intron, where each of the flanking exons
is at least 20 bp long, with at least 80% alignment
An match between a cluster and a matching genomic region
is classified as good if its alignment contains
at least one
When browsing matches in a chromosome,
it is possible to select only the good matches.
When a cluster has more than one match, a table of all matches is
shown when the cluster is searched. (When browsing matches, use
the link containing the number of matches, just above the
graphics, to display this table.) The matches are sorted by quality,
starting with the best, according to the follwing criteria
The alignment length and % identity are given for the "best" contig
(i.e., highest number of exons).
Matches with possible alternative splicing are displayed as
orange table rows, other good matches as pink, and bad matches
as light blue.
- good matches before bad matches
- highest number of exons
- highest effective alignment length
(#bp aligned in contig multiplied by % identity)
Alternative splice candidates
Possible sites of alternative splicing are marked by yellow bands
in the graphics (when "Highlight inconsistent regions" is
checked). The detailed analysis in each case is available by clicking on
the yellow band, or on the "view" links near the display options.
Further indication is also available on the graphics by checking
"Indicate possible splice variants" (then missing exons are marked
by green, putative introns by blue, and alternative donor/acceptor
sites by red).
When browsing matches in a chromosome,
it is possible to select only matches where alternative splice
candidates are found (that is, there are yellow bands).
are considered when detecting splice variants.
If a match contains at least two good contigs, overlapping
by at least 100 bp in the genomic sequence, the splice sites
according to the alignments are compared and inconsistencies
reported as possible splice variants. They are classified as
missing exons, putative introns, or
alternative donor/acceptor sites
(must differ by at least 9 bp to be marked), or as combinations of
Coward,E., Haas,S.A., and Vingron,M. (2002).
SpliceNest: visualization of gene structure and alternative
splicing based on EST clusters.
Trends Genet., 18 (1), 53-55.
Haas,S.A., Beissbarth,T., Rivals,E., Krause,A.,
and Vingron,M. (2000).
GeneNest: automated generation and visualization of gene indices.
Trends Genet. 16 (11), 521-523.
Krause,A., Stoye,J., and Vingron,M. (2000)
The SYSTERS Protein Sequence Cluster Set.
Nucleic Acids Res. 28 (1), 270-272.
Krause,A., Haas,S.A., Coward,E., and Vingron,M. (2002).
SYSTERS, GeneNest, SpliceNest: Exploring sequence space from
genome to protein.
Nucleic Acids Res., 30 (1), 299-300.
Pieces of the puzzle: expressed sequence tags and the
catalog of human genes.
J. Mol. Med. 75, 694-698.
International Human Genome Sequencing Consortium (2001).
Initial sequencing and analysis of the human genome.
Nature 409, 860-921.
Kurtz,S., Choudhuri,J.V., Ohlebusch,E., Schleiermacher,C., Stoye,J.,
and Giegerich,R. (2001).
REPuter: the manifold applications of repeat analysis
on a genomic scale.
Nucleic Acids Res. 29, 4633-4642.
Florea,L., Hartzell,G., Zhang,Z., Rubin,G.M., and Miller,W. (1998).
A computer program for aligning a cDNA sequence with a genomic
Genome Res. 8, 967-974.
Last modified: Tue Aug 20 16:35:32 MET DST 2002