Gene aligned using BlastX not present on desired chromosome?

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
pgupta
Posts: 13
Joined: Sat Aug 18, 2012 9:59 pm

Gene aligned using BlastX not present on desired chromosome?

Post by pgupta » Thu Apr 18, 2013 8:54 pm

Hi

One of the fosmids(fosmid_1779L23-D.ana) we are working on aligns a bunch of genes in blastX alignment region.A protein blast for one of the predicted genes in that region in flybase gives us a number of hits with good e-values.However, most of the genes that have good e values are present on X or 2R chromosome and the one that is present on 3L-Muc68D-RB is actually not aligned on the blastX track. According to flybase the gene Muc26B which is present in the blastX alignment track is actually present on chromosome 2.
In this case should we just ignore the name and annotate Muc68D-RB as part of the fosmid??
The only other gene that comes up with good e value is CG34220 but its present on 2R.

Thanks

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Gene aligned using BlastX not present on desired chromos

Post by wleung » Thu Apr 18, 2013 10:10 pm

Previous analysis by Bhutkar et al. shows that ~95% of the D. melanogaster genes stay on the same Muller element across the 12 Drosophila species. This means that while gene movement across different Muller elements are rare, they do occur ~5% of the time.

When trying to identify the putative ortholog at a given genomic region of D. ananassae, the question we should address is whether that region shows the highest degree of sequence similarity to the putative D. melanogaster ortholog compared to the rest of the D. ananassae whole genome assembly. We can confirm the ortholog assignment by doing the reciprocal search (i.e. perform a tblastn search of the D. ananassae gene model against the entire D. melanogaster assembly to verify that it matches the region with D. melanogaster ortholog the best).

In this case, among all the hits that mapped to the end of fosmid_1779L23, CG34220-PA has a much more significant E-value (7e-100) and higher percent identity than the rest of the hits. Most of the other hits only show similarity to the low complexity portion (i.e. the regions with a large number of Threonines) of the protein.

A tblastn search of CG34220-PA against the entire D. ananassae assembly shows that the best match to this protein is in the same region as our fosmid (at around 21Mb of scaffold_13337). Hence this region of fosmid 1779L23 likely contain the putative ortholog of CG34220.

Note that the match to Muc68D-PB is much less significant than the rest of the hits shown on the genome browser (E-value of 5.4e-29 and a percent identity of 41%). A tblastn search against the D. ananassae whole genome assembly shows that it matches best to a different part of the Muller D element. Hence the fosmid_1779L23 does not contain a putative ortholog of Muc68D-PB.

Post Reply