siz gene in fosmid 20 question

siz gene in fosmid 20 question

Post by cmackinnon » Sun Apr 01, 2012 8:37 pm

I'm trying to help my students do the siz gene annotation [2008 3L/extended, fosmid 20]. I used the Gene Record Finder's aa sequence for isoform siz-RD exon 1 for the blastx, and the blastx report for exon 1 of siz RD shows a very poor match [Expect = 8.2; identities = 50%. Some of gene models [Twinscan, Geneid, Augustus] on the genome browser don't show an exon 1. The flybase aa sequence for exon 1 isn't much different from the gene recorder finder sequence; the blastx report using the flybase aa sequence isn't much better. When I looked for the end of the exon/beginning of intron, I couldn't find a GT that was anywhere near the end of the Gene Recorder Finder sequence. How do I know if the isoform RD is "real"? What other data, besides a poor blastx, lack of splice sites, are needed before an interpretation can be made that "exon 1" isn't "real." [No April Fool joke!].

Re: siz gene in fosmid 20 question

Post by cshaffer » Mon Apr 02, 2012 12:34 pm

It is certainly the case that we run across genes on occasion where one particular isoform from melanogaster is just not possible in the new species, even in species as close as D erecta. The fact that you are looking for the first exon makes this more complicated in that we find that small first exons are often more difficult to find, both because of their small size and that they seem to show more variation in sequence.

However I am gussing in this case that there was a trivial error (like a typo somewhere) as I was able to find the mostly likely sequence for the first exon of isoform D here with 100% identity:

 Score = 54.7 bits (113), Expect = 0., P = 0.
 Identities = 22/22 (100%)
 Frame = +1

Also given the map of siz (see the gene in GBrowse) I would not be surprised if you will not find the first exon of the B isoform in fosmid20. If the size of that intron is the same in Dmel and Dere that B isoform exon will be in the adjacent fosmid19. We do not require any of our students to annotate an isoform if some of it is located outside the fosmid but feel free to follow your own policy here, its your class!

Re: siz gene in fosmid 20 question

Post by wleung » Mon Apr 02, 2012 12:46 pm

While the first coding exon of siz-RD is not in any of the gene predictions (e.g. from Genscan, Twinscan, etc), this first exon (2_302_0) actually has high degree of sequence similarity compared to D. melanogaster. As Chris mentioned, you should be able to detect the sequence similarity with bl2seq (using the BLASTX program and searching the fosmid 20 sequence against the CDS). Based on the BLAST alignment, the exon spans from 8896-8961 on fosmid20. In addition to the sequence similarity, there are cross-species RNA-seq data (from D. yakuba) and TopHat splice junction prediction in this region which are consistent with the hypothesis that the exon 2_302_0 is at this region of fosmid20.

A BLAT search of the first three exons of siz (1_302_0, 2_302_0, 4_302_0) indicates that the first CDS of siz-RB is only found in the adjacent fosmid (fosmid21). The CDS 2_302_0 are found in both fosmid21 and fosmid20 while the CDS 4_302 are found in both fosmid20 and fosmid19:
Re: siz gene in fosmid 20 question

Post by cmackinnon » Mon Apr 02, 2012 2:26 pm

Thank you! I forgot about the sequence overlaps in fosmids [duh]--guess this counts as April Fool's after all! :-). Another teachable moment!

I found the first exon in siz RB. The annotation for siz RB and RA passes Gene Recorder Finder; haven't finished RC, but will let you know if I have problems!

