Query regarding D ananassae projects

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
Posts: 13
Joined: Sat Aug 18, 2012 9:59 pm

Query regarding D ananassae projects

Post by pgupta » Mon Jan 21, 2013 6:48 am


I had a query regarding the D ananassae projects for Jan 2013. For a number of projects where genscan predicts say 4 genes and the RNA seq data shows a high signal in particular regions when the predicted protein sequence is blasted no significant hits are obtained.
Also although genscan predicts genes for the project blastX does not report any genes and also blasting predicted protein sequence for the different predicted exons does not give any significant hits throughout the fosmid.

Also if for a project the BlastX does not predict any genes (genscan predicts 2 genes) but blasting the exon predicted protein sequence( using RNA seq data information) gives 4-5 genes all of which have e values really close to each other (e.g. 1.85469e-37,1.87023e-37,1.9499e-37,1.10594e-36), do we select the first two genes and use the gene record finder for these genes only?

Does this mean that there are actually no genes in the fosmid or am I making a mistake somewhere.Just want to make sure about this as I encountered this problem in a number of projects(while selecting projects for students to take up in the lab).

Waiting for you response.



Posts: 211
Joined: Sun Feb 04, 2007 10:29 pm
Location: Washington University in St Louis

Re: Query regarding D ananassae projects

Post by cshaffer » Mon Jan 21, 2013 6:31 pm

All very good questions,

without seeing the exact evidence I cannot make any specific call but here are some general things to consider:

First there will be some clones that at the end of the analysis one will have decided that there are no real genes in the clone. Unlike melanogaster, many of these species have what look like large "Gene deserts", that's my term for these large 50-100+ kb regions with little evidence for protein coding genes. Its quite possible any given D ana clone will be in one of these regions and in fact have no genes. Here of course, I mean no evidence that we can come up with to indicate the presence of a gene, it is possible, and would be very cool, if there was something in these regions that we just cannot recognize. But given the evidence we CAN collect it really looks like there are no real genes.

If you believe your clone is from one of these regions and there are in fact no genes in your clone, the first section of the project report form tells you what evidence you should collect to convince yourself of this and what to put in the report form to support your conclusion. I will post it here:
Instructions for project with no genes
If you believe that the project does not contain any genes, please provide the following evidence to support your conclusions:

1. Perform a BLASTX search of the entire contig sequence against the non-redundant (nr) protein database. Provide an explanation for any significant (E-value < 1e-5) hits to known genes in the nr database as to why they do not correspond to real genes in the project.

2. For each Genscan prediction, perform a BLASTP search using the predicted amino acid sequence against the protein database (nr) using the strategy described above.

3. Examine the gene expression tracks (e.g. cDNA/EST/RNA-Seq) for evidence of transcribed regions that do not correspond to alignments to known D. melanogaster proteins. Perform a BLASTX search against the nr database using these genomic regions to determine if the region is similar to any known or predicted proteins in the nr database.
As for genescan, it is very agressive at predicting genes and has a higher false positive rate. The idea is to have as small a false negative rate as possible so as to not miss any putative genes, but it does mean that there will be some false positives. I would use BLAST similarity as a first line for asking if a given prediction is likely really a gene or not as its pretty rare to run across a totally novel protein these days from higher euks. If you do have a genescan prediction which is supported by RNA-seq but no BLAST hits it's possible that you have found a novel gene, there are really no guidelines here, use your best judgement. If at the end of your analysis you believe it is a gene, go ahead and make the call, then support your conclusion with as much evidence as you can as you fill out the report form.

The issue of having a number of blast hits that are all very nearly the same score when trying to assign orthology is very tricky, your only other evidence that can help is synteny, genes tend to move around the chromosome only very very slowly in these species, so if you are luckily you will find neighboring genes that all cluster together in Dmel and thus convince you of which ortholog you found in your clone. On a practical side this means leaving this gene until the end so you have an idea what other genes are in your clones, you may need to look to adjacent clones to find the neighboring genes. If you find no evidence from synteny to help you then stick with parsimony and go with the gene that has the least number of changes (i.e. mutational events), for this I would probably compare the genomic region of the two putative orthologs using clustalw and avoid the protein alignment so as to avoid any issues with code degeneracy.

Post Reply