I am working with a student on the D. eugracilis dot chromosome, contig 50. We are annotating the RhoGAP102A gene, and she has successfully annotated the C and E isoforms. The D and F isoforms have additional exons that are giving us trouble. There are two issues:
1) In D. melanogaster, exons 13 (in isoform D) and 14 (in isoform F) overlap, with 47 common amino acids. Exon 13 has additional amino acids at the 5' end of the exon. BLAST searches with both exons return the coordinates 40255-40338 (frame +1) which covers 27 of the 47 shared amino acids. There is a splice acceptor at 40254 that is in phase with the splice donor in Exon 12. Upstream of this splice site, there are stop codons in all three frames, which suggest the conserved amino acids that are upstream in D. melanogaster do not exist in D. eugracilis. However, the RNA-Seq results show expression of the region of contig 50 that is 5' to the 40254 coordinate. Just doing a BLAST search with the 5' half of the sequence of Exon 13 (the longer of the two overlapping exons) does not generate any significant alignment, suggesting that part of this exon has been lost in D. eugracilis, which would make Exon 13 and Exon 14 identical, and thus make the CDS of isoforms D and F identical. What evidence should we include to make the conclusion that part of these exons is not present in D. eugracilis? Or is there evidence we should consider to show that there may be a sequencing error that has made it difficult to find the remainder of this exon?
2) Exon 15 (also present in isoforms D and F) does not seem to be present in D. eugracilis. At first, we thought it may be present on the next contig, but a BLAST search in FlyBase does not find any significant matches to Exon 15 in eugracilis (even with a low-stringency search). Also, in melanogaster, exons 12 through 15 are all present in a regions of 606 nucleotides in the genomic sequence. A similar distance in eugracilis is still well within contig 50. I tried to perform the BLAT protocol described in the "Not all of the gene in the contig?" post from March 2015, but the BLAT server responded with an error message. The lack of Exon 15 means there is no stop codon present in these two isoforms, though the Exon 13/14 described above could be read through to a stop codon at 40368-71. This would be the second significant deviation from the D. melanogaster gene structure, and thus, concerns me. What other evidence should we examine to determine if Exon 15 is or is not present in eugracilis?
Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
1 post • Page 1 of 1