Discrepancies for some isoforms

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
jkennell
Posts: 3
Joined: Sat Aug 09, 2014 1:29 am

Discrepancies for some isoforms

Post by jkennell » Tue Oct 21, 2014 4:35 pm

My student is annotating the M6 ortholog in Biarmipes (project: 3L control contig 62). He had trouble when he started looking for the shared 1st coding exon of M6-PE and M6-PF isoforms. By BLASTX the single exon has matches in two different frames. When we BLASTn the transcript for this exon, we find a pretty good match but signs of some indels between the two which likely explains the different frames. In addition, there isn't any RNAseq to support that this exon is expressed. How should we go about reporting and documenting this?

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Discrepancies for some isoforms

Post by wleung » Tue Oct 21, 2014 11:08 pm

The available evidence suggests that the initial CDS of the E and F isoforms of the M6 gene (CDS 2_7735_0) might not exist in D. biarmipes. However, based on parsimony (i.e. minimize the number of changes compared to D. melanogaster), I would annotate the CDS 2_7735_0 in D. biarmipes based on the weak evidence from the blastx alignment and the CDS predicted by Genscan. Because the next CDS (3_7735_2) begins at 36,112 and it has a phase 2 splice acceptor site relative to frame -3, CDS 2_7735_0 must have a phase 1 donor site (relative to frame -1). Based on the blastx alignment, the closest phase 1 donor site (relative to frame -1) is located at 37,035-37,034. Therefore, I would annotate the CDS so that it spans from 37,243 to 37,036.

To gather additional evidence to support this hypothesis, we will perform a tblastn search of the D. melanogaster CDS 2_7735_0 against the "Genome Assembly (NT)" databases of multiple Drosophila species using the FlyBase BLAST service.

The tblastn search results shows that the entire CDS 2_7735_0 is conserved in D. erecta:
Dere_CDS_2_7735_0_tblastn.png
Dere_CDS_2_7735_0_tblastn.png (80.64 KiB) Viewed 6205 times
However, there is an in-frame stop codon within the alignment to CDS 2_7735_0 in D. eugracilis. The tblastn alignment to D. ficusphila only matches the first 39aa of the D. melanogaster CDS:
Deug_Dfic_tblastn_stop_partial_alignments.png
Deug_Dfic_tblastn_stop_partial_alignments.png (130.61 KiB) Viewed 6205 times
In D. takahashii (i.e. the species most closely related to D. biarmipes), tblastn produced two separate alignment blocks that are in two different open reading frames. Hence the frame shift exists in both the D. biarmipes and D. takahashii orthologs of the M6 gene and the discrepancy is unlikely to be caused by an error in the D. biarmipes consensus sequence.
Dtak_split_CDS_2_7735_0_tblastn.png
Dtak_split_CDS_2_7735_0_tblastn.png (109.58 KiB) Viewed 6205 times
There are no significant matches (E-value < 1e-2) in other Drosophila species that are more distantly related to D. melanogaster (e.g. D. elegans, D. pseudoobscura, D. mojavensis, or D. grimshawi).

Collectively, the available evidence suggests that this CDS might only exist in the melanogaster subgroup. However, based on the GEP annotation protocol, I would annotate the CDS in D. biarmipes at contig62:37243-37036 in order to preserve both the E and F isoforms.

jkennell
Posts: 3
Joined: Sat Aug 09, 2014 1:29 am

Re: Discrepancies for some isoforms

Post by jkennell » Fri Oct 24, 2014 4:35 pm

Thanks, Wilson! Your response was very helpful.

Post Reply