Problem in annotating D.ananassae fosmid

Posted: Thu Feb 14, 2013 8:57 pm
by pgupta

We have started working on one of the D.ananassae projects for the lab and are encountering this one common problem while annotating different isoform of the same gene.Flybase blast using predicted protein sequence gives us 6 isoforms of the gene, each having different exon positions.
These are the problems that we are currently facing:

1.BlastX alignment predicts 2 exons while the drosophila ortholog on gene record finder shows more than 2 fro each isoform.

2.We can find the start codon for the first exon of all isoforms in the frame that comes up on NCBI blastx but for the 2nd exon(according to the BlastX alignment position ) which starts at the predicted position, no stop codons are found at the end of the exon. The alignment position is from 842-2 in frame -2 and there are no stop codons found in that frame.

3. Also in some isoforms some of the exons are in + frames and some are in -frames and although blastX shows only 2 exons and NCBI Blastx also aligns the 2nd exon to the end of the fosmid, the 4th exon in melanogaster ortholog is the one which has the stop codon but on blastX alignment both the 3rd and the 4th exon aligns to a position which is far away from the predicted gene site.

We are trying to annotate other genes in the fosmid.Hoping for a quick response which might help us solve this problem and annotate this fosmid.



Re: Problem in annotating D.ananassae fosmid

Posted: Thu Feb 14, 2013 11:04 pm
by wleung
Based on your description of the problem, I assume you are trying to annotate the SPoCk gene at the beginning of fosmid_1145J01 in D. ananassae
Beginning of fosmid_1145J01
Genes that are near the beginning or the end of the fosmids are often incomplete. Because the fosmid insert corresponds to a random region of the genome, it could begin or end in the middle of a gene. In these cases, you will only be able to map a subset of the coding exons (CDS) in your project. The blastx alignments to the other CDS's would be spurious because they are actually located outside the span of your fosmid project.

In this case, blastx detect strong sequence similarity among fosmid_1145J01, CDS 1_531_0 and CDS 8_531_0. A closer examination of the blastx alignment between fosmid_1145J01 and CDS 8_531_0 shows that the alignment only contains the first half of the CDS (280/571, highlight in orange in the figure below). The last position of the alignment in the query is base position 2 (highlight in red), which indicates that the alignment terminates at the beginning of the fosmid_1145J01.
Last CDS of SPoCk in fosmid_1145J01
Collectively, the blastx alignment to CDS 8_531_0 suggests our fosmid only contain part of the SpoCk gene and that the CDS's downstream of 8_531_0 are located beyond the beginning of the fosmid. Consequently, we would not be able to map the CDS's 9_630_2, 10_531_2, and 11_531_1. Note that the CDS's 8_629_0 and 8_531_0 overlap with each other and end at the same position in D. melanogaster, hence you will also be able to map only part of the CDS 8_629_0 in your project.

Consequently, we expect to be able to find all the CDS's between 1_531_0 to 8_629_0. Note that some of the CDS's are quite small so you might need to increase the Expect Threshold in your BLAST searches and rely on the RNA-seq and TopHat junction predictions to help you find these smaller CDS.

Re: Problem in annotating D.ananassae fosmid

Posted: Sat Feb 16, 2013 4:18 am
by pgupta

Thank you for the quick and very detailed response :) .We were trying to annotate the SPoCk gene.

I just have one more question.As only half of the gene seem to be present on the fosmid can we use the gene checker model for this gene(as we do not have the stop-cordinates)? If not, then how do we include this gene in the report?


Re: Problem in annotating D.ananassae fosmid

Posted: Sat Feb 16, 2013 6:04 am
by wleung
Yes, you could use the Gene Model Checker to verify partial genes. In this case, you could select the Partial option under the "Completeness of Gene Model Translation" field and then select Missing 3' end of translated region under the "Region Missing" field.
check partial gene using the Gene Model Checker
Re: Problem in annotating D.ananassae fosmid

Posted: Tue Feb 26, 2013 12:57 am
