Sequencing error?

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Sequencing error?

Post by drevie » Thu May 01, 2014 11:00 pm

In annotating a gene, we have a case where there clear donor/acceptor sites for two exons, but the resulting peptide would have a frameshift. In looking at the end of the first exon, there are 5 As, but other Drosphila species have 4 As. Is this a sequencing error? (pictures below).

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Thu May 01, 2014 11:02 pm

3' end of first exon

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Thu May 01, 2014 11:13 pm

3' end of first exon
Attachments
3end.jpg
Conservation of 3' end of first exon
3end.jpg (218.5 KiB) Viewed 12169 times

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Thu May 01, 2014 11:32 pm

3' end conservation
Attachments
3endConservation.jpg
Conservation
3endConservation.jpg (205.44 KiB) Viewed 12168 times

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Thu May 01, 2014 11:33 pm

5' end of second exon
Attachments
New Picture.jpg
5' end of 2nd exon
New Picture.jpg (218.52 KiB) Viewed 12168 times

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Sequencing error?

Post by wleung » Fri May 02, 2014 12:14 am

I am not sure I understand the question. The blastx alignment for CDS 5_1762_0 ends at 20929 in frame -1. As shown in the screenshot below, there is a phase 0 splice donor site immediately upstream at 20928-20927.
splice_donor_CDS_5_1762_0.png
splice_donor_CDS_5_1762_0.png (169.81 KiB) Viewed 12164 times
The blastx alignment for CDS 4_1762_0 begins at 20869 in frame -3 and there is a phase 0 splice acceptor site immediately upstream at 20871-20870 Hence the splice donor and acceptor sites are both in phase 0 and they are compatible with each other. The proposed splice junction is also supported by RNA-Seq and two gene predictors.
splice_acceptor_CDS_4_1762_0.png
splice_acceptor_CDS_4_1762_0.png (186.33 KiB) Viewed 12164 times

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Fri May 02, 2014 4:14 am

Sorry. Isoforms PA and PB are missing the middle exon, and this is the problem-the reading phase changes. Isoform PC uses three consecutive exons, including the two in your reply.

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Sequencing error?

Post by wleung » Fri May 02, 2014 5:09 am

CDS 6_1774_0 (the initial coding exon of the B isoform of CG2219) likely does not exist in D. biarmipes. In D. melanogaster, CDS 6_1774_0 and 7_1762_0 overlap with each other and CDS 6_1774_0 has one extra amino acid near the splice donor site. Examination of the Conservation track shows that the alternate splice acceptor site no longer exist in D. biarmipes and D. takahashii (the species most closely related to D. biarmipes). There are no additional splice donor sites available before the stop codon at 21095-21093 in frame -2.
CG2219_conservation_donor_CDS_6.png
CG2219_conservation_donor_CDS_6.png (53.82 KiB) Viewed 12159 times
Examination of the RNA-Seq reads that aligned to this region do not show any systematic differences between the aligned RNA-Seq reads and the consensus sequence. Since isoforms A and B only differ from each other because of the initial CDS 6_1774_0, I would omit the annotation of isoform B of CG2219 in D. biarmipes because CDS 6_1774_0 does not exist.

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Fri May 02, 2014 8:49 pm

I guess the confusion is that the genome browser shows 4 exons, while the GeneModelChecker believes there are five for isoform A. The difference is apparently the third exon, which is in PC but not PA or PB (as seen in the browser). When all five exons are entered for PA, it passes the tests. However, when the exons shown in the browser are entered, the problem described above happens.

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Sequencing error?

Post by wleung » Fri May 02, 2014 9:05 pm

Per our previous discussions, the blastx track shows region of sequence similarity between the contig and a D. melanogaster protein. Consequently, your students should not use the blastx track to infer the gene structure. This is also one of the main reasons for performing the exon by exon searches. In addition, because blastx does not take the splice sites into account when it generates the alignment, we will need to examine the region near the ends of the alignments to determine the correct splice sites.

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Sequencing error?

Post by drevie » Fri May 02, 2014 9:43 pm

I think the confusion was when they put in PA, the browser showed them having an exon that wasn't in the browser PA model but instead was in the browser PC. They therefore retried it without that exon. For this gene, the browser is quite a bit off.

Thanks.

Post Reply