Mutation or sequencing error?

Posts: 1
Joined: Tue Jun 19, 2012 9:26 pm

Post by chowell » Fri Nov 09, 2012 6:03 am

One of my students is annotating the ey gene on contig40 in biarmipes. She is having trouble with the PC isoform. There is no start codon found in the first exon of this isoform (exon 3: 17_1548_0). It looks like there is a C nucleotide instead of an A nucleotide at position 38619, and she could not find another ATG upstream of this site without running into a stop codon. So, do you think this is a mutation in biarmipes and they do not express the PC isoform or is there a sequencing error in the biarmipes genome, or is there an alternative explanation? Thanks for your help.

Exon 3: >ey:17_1548_0
Score = 82.4 bits (202), Expect = 3e-22, Method: Compositional matrix adjust.
Identities = 52/55 (95%), Positives = 53/55 (96%), Gaps = 0/55 (0%)
Frame = +3

Screen Shot 2012-11-09 at 1.13.37 AM.png
Screen Shot 2012-11-09 at 1.13.37 AM.png (14.11 KiB) Viewed 3775 times

Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Mutation or sequencing error?

Post by wleung » Fri Nov 09, 2012 4:12 pm

The change in the genomic sequence is unlikely to be an error in the consensus because there multiple RNA-seq reads aligned to this region and they agree with the consensus.

A FlyBase TBLASTN search of the first CDS of the C isoform (17_1548_0) against the genomic assemblies in different Drosophila species shows that this change from M to L is only found in [i[D. biarmipes[/i] and it is not in the other newly sequenced sequences or in D. ananassae. Because the only difference between the C isoform and the other isoforms is this single CDS, the analysis suggests that this isoform might not exist in D. biarmipes.

However, based on the GEP annotation guidelines, we will nonetheless try to construct a gene model for the C isoform. As you have mentioned, there are no other start codons upstream of this region before you reached the stop codon in frame +3. Consequently, we will rely on the Geneid prediction and use the first start codon in frame +3 at 38694-38696.

The rationale is that removing an isoform is a more substantial change than removing conserved residues. We could explain the conserved residues upstream of the proposed translation start site in the C isoform because the second CDS from the D isoform (26_1548_2) overlaps and extends beyond the beginning of 17_1548_0.

