Most of the D. erecta
mRNA sequences are actually gene predictions that have not been experimentally confirmed. In particular, the RefSeq D. erecta
mRNA records typically have the "XM" prefix and the corresponding GenBank records shows that these are predictions from GLEAN-R (see the features section of a sample GenBank record here
). The GLEAN pipeline combine evidence from sequence alignments and multiple gene predictors to create the final gene models (see the manuscript by Elsik et. al
for details on GLEAN and the paper Evolution of genes and genomes on the Drosophila phylogeny
by the Drosophila 12 Genomes Consortium for details on the GLEAN-R gene models).
Because most of the D. erecta
mRNA records have not been experimentally confirmed (as denoted by the XM/XP prefixes in the accession numbers), incorporating the D. erecta
RNA data into the genome browser could potentially propagate errors in the GLEAN-R models into our final gene annotations. Consequently, if we decide to include the predicted D. erecta
mRNA sequences as an evidence track, we would need to label them as gene predictions and not as real mRNA's.
In contrast, the D. yakuba
tracks are actually based on RNA-Seq data from D. yakuba
. The D. yakuba
transcripts are assembled using Cufflinks and Oases and then mapped against the D. erecta
fosmids. Consequently, even though the predictions are not as good as full-length cDNAs, these predicted transcripts from D. yakuba
are supported by experimental evidence. Note that the modENCODE project did not generate RNA-Seq data for D. erecta
which is why we have do the cross-species mapping.