New genes

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

New genes

Post by drevie » Wed Apr 23, 2014 1:56 am

How do we use Gene model checker to check new genes (not in D. mel). When my students tried, GMC didn't like the name they used.

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: New genes

Post by wleung » Wed Apr 23, 2014 4:54 am

For novel genes that are not present in D. melanogaster, your student should enter the species and the ortholog name that was used to construct the gene model in the "Ortholog in D. melanogaster" field of the Gene Model Checker configuration form. For example the GEP has previously identified a novel gene in D. virilis called GEP001 that is not present in D. melanogaster. To verify this gene model, you would enter Dvir-GEP001-PA in the "Ortholog in D. melanogaster" field.

The Gene Model Checker will issue a warning indicating that the ortholog cannot be found in D. melanogaster and it cannot produce the dot plot or the protein alignment. However, the Gene Model Checker will still validate the proposed gene model using the checklist and produce the transcript and peptide sequences for the proposed gene model. Hence you can use the "Align two or more sequences" functionality in NCBI BLAST to compare the proposed model against the putative ortholog in the reference species.

Please include a description of the novel gene in the GEP Annotation Report and the evidence used to justify the presence of a novel gene (e.g. RNA-Seq coverage, BLASTP results against the RefSeq protein database).

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: New genes

Post by drevie » Thu Apr 24, 2014 3:26 pm

I don't believe there is any ortholog, although I told them to do blast searches to see. It was predicted by some of the prediction programs like Nscan. I assume that they can therefore just enter GEP001-PA that it will be accepted? Or should we enter Dbii-GEP001-PA?

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: New genes

Post by drevie » Thu Apr 24, 2014 3:28 pm

I believe there was also RNAseq evidence for it.

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: New genes

Post by wleung » Thu Apr 24, 2014 4:46 pm

Yes, you can use "Dbia-GEP001-PA" as the gene name when verifying the gene model using the Gene Model Checker. However, in general, I would not annotate a feature as a gene unless the feature shows significant sequence similarity to another known gene in the NCBI nr protein database or contains a conserved domain. This is because the accuracy of most gene predictors are only between 30-50%. RNA-Seq coverage in a region could correspond to other features besides protein coding genes (e.g. transposon fragments, non-coding RNA genes). In addition, we cannot reliably construct the alternate splicing pattern of a gene using just the RNA-Seq data and results from the gene predictors. Tools such as Cufflinks could use the RNA-Seq read coverage and TopHat junctions to predict different isoforms but accurate reconstruction of transcripts from RNA-Seq data remains an active area of research. Please refer to the following manuscript by Steijger T. et al. for more information:

Steijger T. et. al. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods. 2013 Dec;10(12):1177-84.

Post Reply