Posted: Tue Oct 21, 2014 4:35 pm
by jkennell
My student is annotating the M6 ortholog in Biarmipes (project: 3L control contig 62). He had trouble when he started looking for the shared 1st coding exon of M6-PE and M6-PF isoforms. By BLASTX the single exon has matches in two different frames. When we BLASTn the transcript for this exon, we find a pretty good match but signs of some indels between the two which likely explains the different frames. In addition, there isn't any RNAseq to support that this exon is expressed. How should we go about reporting and documenting this?

Posted: Tue Oct 21, 2014 11:08 pm
by wleung
The available evidence suggests that the initial CDS of the E and F isoforms of the M6 gene (CDS 2_7735_0) might not exist in D. biarmipes. However, based on parsimony (i.e. minimize the number of changes compared to D. melanogaster), I would annotate the CDS 2_7735_0 in D. biarmipes based on the weak evidence from the blastx alignment and the CDS predicted by Genscan. Because the next CDS (3_7735_2) begins at 36,112 and it has a phase 2 splice acceptor site relative to frame -3, CDS 2_7735_0 must have a phase 1 donor site (relative to frame -1). Based on the blastx alignment, the closest phase 1 donor site (relative to frame -1) is located at 37,035-37,034. Therefore, I would annotate the CDS so that it spans from 37,243 to 37,036.

To gather additional evidence to support this hypothesis, we will perform a tblastn search of the D. melanogaster CDS 2_7735_0 against the "Genome Assembly (NT)" databases of multiple Drosophila species using the FlyBase BLAST service.

The tblastn search results shows that the entire CDS 2_7735_0 is conserved in D. erecta:
Dere_CDS_2_7735_0_tblastn.png (80.64 KiB) Viewed 5667 times
However, there is an in-frame stop codon within the alignment to CDS 2_7735_0 in D. eugracilis. The tblastn alignment to D. ficusphila only matches the first 39aa of the D. melanogaster CDS:
Deug_Dfic_tblastn_stop_partial_alignments.png (130.61 KiB) Viewed 5667 times
In D. takahashii (i.e. the species most closely related to D. biarmipes), tblastn produced two separate alignment blocks that are in two different open reading frames. Hence the frame shift exists in both the D. biarmipes and D. takahashii orthologs of the M6 gene and the discrepancy is unlikely to be caused by an error in the D. biarmipes consensus sequence.
Dtak_split_CDS_2_7735_0_tblastn.png (109.58 KiB) Viewed 5667 times
There are no significant matches (E-value < 1e-2) in other Drosophila species that are more distantly related to D. melanogaster (e.g. D. elegans, D. pseudoobscura, D. mojavensis, or D. grimshawi).

Collectively, the available evidence suggests that this CDS might only exist in the melanogaster subgroup. However, based on the GEP annotation protocol, I would annotate the CDS in D. biarmipes at contig62:37243-37036 in order to preserve both the E and F isoforms.

Posted: Fri Oct 24, 2014 4:35 pm
by jkennell
Thanks, Wilson! Your response was very helpful.