Not all of the gene in the contig?

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
Posts: 11
Joined: Sun Feb 04, 2007 10:26 pm

Not all of the gene in the contig?

Post by mshaw » Wed Mar 25, 2015 3:36 pm

I have a student who is working on annotating Caps in D. elgans. Exons 3, 4, 5 6 are present with very high sequence homology but even with less stringent parameters homology with any of the other exons is very low. RNAseq data indicates expression of exons 3-6. It is a huge gene in Dm with an intron of 1,600 bp between exon 3 and 4. It seems possible that we only have exons 3-6 in our contig. What other kinds of evidence do you suggest that we look at?

Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Not all of the gene in the contig?

Post by wleung » Wed Mar 25, 2015 4:37 pm

One approach that you can use to support the hypothesis that the project region only contains part of a gene is to show that the best match to the other coding exons (CDS) are located in the adjacent contigs. Because D. elegans is closely related to D. melanogaster, you can typically perform a BLAT search to map most of the coding exons of a gene against the entire collection of the D. elegans projects.

To setup this search, navigate to the gene record for "Caps" using the Gene Record Finder and then click on the "Export All Unique CDS to FASTA" button to retrieve all the coding exon sequences. Select and then copy the sequences to the clipboard. Open a new tab and navigate to the GEP UCSC Genome Browser and then click on the "BLAT" link. Select "D. elegans" under the "Genome" field and "Jan. 2015 (GEP/Dot) under the "Assembly" field. Then paste the sequences into the text area and click on submit. This would search all the CDS of Caps against all the D. elegans Muller F element projects.
configure_blat_CDS_search.png (115.25 KiB) Viewed 9849 times
The results show that the first two CDS of Caps (1_11471_0 and 2_11471_1) are located on contig63 (red boxes). The next four CDS (3_11471_0 to 6_11471_0) are located on contig64 (blue boxes), and the rest of the CDS are found on contig65. Note that CDS 7_11471_1 is missing because it consists of only four amino acids. Examination of the TopHat splice junctions for CDS 6_11471_0 suggest that this CDS is likely located in contig64.
blat_search_Caps_CDS_Delegans.png (243.09 KiB) Viewed 9849 times

Post Reply