The GEP Web Framework has been reset for the 2017 Spring semester. The projects with the highest priority are the annotation projects from the D. elegans and D. ficusphila Muller D elements, and the TSS projects from the D. biarmipes Muller F element. If you would like to incorporate a wet-lab component into your course, then the D. ficusphila D element and the D. eugracilis F element sequence improvement projects have the highest priority.
The key changes to the GEP Web Framework are summarized below:
1. Annotation projects for Spring 2017
- 8 D. elegans and 52 D. ficusphila D element projects remaining from Fall 2016
- The 27 D. ficusphila D element projects with no submissions have the highest priority
- 50 new annotation projects from the D. eugracilis F element
- Recording of the Fall 2016 GEP webinar on updates to the GEP Web Framework, including the protocols for identifying and documenting consensus errors
- 69 D. biarmipes F element projects remaining from Fall 2016
- Recording of the Fall 2016 GEP webinar on the annotation of Transcription Start Sites
- 74 D. biarmipes, D. elegans, and D. ficusphila projects remaining from Fall 2016
- Gaps within all the D. biarmipes and D. elegans projects, and within the D. ficusphila F element projects have all been examined during Summer 2016, and resolved if possible
- 12 new sequence improvement projects from the D. eugracilis F element
- Updates to Consed installation instructions for macOS
- New isoform column in the CDS workbook produced by the Gene Record Finder
- New "Gnomon Transcripts" and "Drosophila Gnomon Transcripts" tracks for the Drosophila species sequenced by modENCODE (BCM-HGSC assemblies) on the GEP UCSC Genome Browser (see description below)
- Updated GEP web framework tools (e.g., Gene Model Checker, Gene Record Finder)
- Updated FlyBase genes, exons, and CDS tracks for the D. melanogaster genome browser
- Updated protein and transcript alignment tracks for GEP projects and for the Drosophila species sequenced by modENCODE (BCM-HGSC assemblies)
- Revised curriculum based on changes to NCBI sequence identifiers (remove GI numbers) and the migration to https
- Revised curriculum based on FlyBase release 6.13 and NCBI BLAST+ 2.6.0
Below is a more detailed description of the changes that we have made for Spring 2017:
1. Annotation projects for Spring 2017
There are 8 D. elegans and 52 D. ficusphila Muller D element projects remaining from Fall 2016. The 27 D. ficusphila D element projects with no submissions have the highest priority in Spring 2017. If your students have completed annotation projects in Fall 2016, please submit these projects at your earliest convenience.
Because all of the D. elegans and D. ficusphila projects have already been claimed at least twice, we have also created a new set of 50 annotation projects from the D. eugracilis Muller F element [Jan. 2017 (GEP/Dot) assembly] for Spring 2017. These projects are derived from three putative F element scaffolds from version 2.0 of the D. eugracilis whole genome assembly.
As a reminder, the D. eugracilis annotation projects were derived from a draft assembly that has not been manually improved. Preliminary analyses suggest that the D. eugracilis Muller F element scaffolds contain many consensus errors that could interfere with the annotations of the coding regions (e.g., could cause frame shifts within coding exons, incompatible splice sites). Using an automated analysis pipeline, we have identified 22 D. eugracilis F element genes (26 coding exons) that contain at least one consensus error. While we have corrected these errors prior to creating the D. eugracilis annotation projects, there could be additional consensus errors in the D. eugracilis F element that affect the gene annotations.
As part of the Fall 2016 GEP webinar on the updates to the GEP Web Framework, we have described strategies that students could use to identify and report consensus errors. The presentation and the webinar recordings are available on the "September 2016 GEP Webinars" page on the GEP Private Wiki. You can also access one of the webinar recordings directly at https://wustl.adobeconnect.com/p1inbxc3osn/.
Instructions on how to identify and document consensus errors are also available through the "Sequence Updater User Guide" (available under "Help" -> "Documentations" -> "Web Framework"). Please include the evidence used to support the putative consensus error correction in the "Consensus sequence errors report form" section of the GEP Annotation Report.
The TSS section of the GEP Annotation Report is optional so you can submit a project without TSS annotations. However, if time permits, we would like to encourage your students to annotate the TSS after they have completed the annotation of the coding regions.
2. TSS projects for Spring 2017
TSS annotations remain an important part of the current GEP research project. Of the 70 TSS projects from the D. biarmipes F element [Aug. 2013 (GEP/Dot) assembly], 69 projects are still available to be claimed.
We would like to encourage you and your students to contribute to the TSS annotation projects. These TSS annotations are essential to the phylogenetic footprinting analyses we have planned to identify conserved regulatory motifs surrounding the core promoters of F element genes. We currently plan to begin the phylogenetic footprinting analysis in Summer 2017.
The PowerPoint presentation and the recordings of the GEP webinars on the annotation of Transcription Start Sites are available on the "September 2016 GEP Webinars" page on the GEP Private Wiki. You can also access one of the webinar recording directly at https://wustl.adobeconnect.com/p7cuk86mya8/.
3. Sequence improvement projects for Spring 2017
There are 74 sequence improvement projects from D. biarmipes, D. elegans, and D. ficusphila remaining from Fall 2016. Gaps within the D. biarmipes and D. elegans F and D elements, and the D. ficusphila F element sequence improvement projects have already been examined by three Washington University students during Summer 2016. Gaps within these projects have either been resolved or they have been tagged as "doNotFinish" (when multiple attempts of PCR and sequencing have failed). Hence the primary focus for these sequence improvement projects would be to resolve errors within mononucleotide runs.
Gaps within the 19 D. ficusphila Muller D element sequence improvement projects (with the project prefix "DFIC7313", "DFIC7408", and "DFIC7314") still need to be resolved. In addition, we have created 12 sequence improvement projects from the D. eugracilis F element in order to assess the quality of the D. eugracilis assembly prior to creating the annotation projects. If you would like to incorporate a wet-lab component into your course, please work on resolving the gaps within the projects from the D. ficusphila D element and the D. eugracilis F element.
If you plan to install Consed on macOS, please note that Consed is incompatible with recent versions of X11 (XQuartz 2.7.10 and 2.7.11). We have updated the GEP Installation Package page on the GEP Wiki with instructions on how to install an older version of XQuartz.
In addition, we have submitted a patch to the Consed developers at University of Washington in order to address this compatibility issue in a future release of Consed. If you are in a computer lab where a recent version of XQuartz has already been installed, please contact us and we can provide you with an unofficial patched version of Consed that is compatible with XQuartz 2.7.10 and above.
4. Updates to the GEP Web Framework
We have added a new "Dmel_isoforms" column to the "unique_exon" worksheet in the CDS workbook produced by the Gene Record Finder in order to help students keep track of the usage of each unique CDS in the different isoforms. Thanks to Michael Muneses (TA for Dr. Nate Mortimer at Illinois State University) for suggesting this feature for the Gene Record Finder.
NCBI has produced a new set of gene predictions for the Drosophila species sequenced by the modENCODE project using the NCBI Eukaryotic Genome Annotation Pipeline. These gene models were based on evidence from RNA-Seq, protein sequence similarity, and results from the Gnomon gene predictor. These gene predictions include multiple isoforms and untranslated regions, and they could be useful during TSS annotations when there is only weak evidence from D. melanogaster.
We have added these gene predictions to the corresponding whole genome (BCM-HGSC) assemblies in the GEP UCSC Genome Browser. We have also performed cross-species mapping of the Gnomon predicted transcripts from eight Drosophila species against each genome assembly. These results are available through the "Gnomon Transcripts" and the "Drosophila Gnomon Transcripts" tracks (under the "Gene and Gene Prediction Tracks" section) on the GEP UCSC Genome Browser, respectively.
5. Synchronize GEP annotation resources to FlyBase release 6.13
The databases for the Gene Record Finder, the Gene Model Checker, the Annotation Files Merger, and the blastx reports in the annotation packages have been updated to FlyBase release 6.13. We have also updated the protein alignments and gene prediction tracks (i.e. blastx protein alignments, SPALN transcript alignments, and genBlastG gene predictions) on the GEP UCSC Genome Browser for the D. biarmipes, D. elegans, and D. ficusphila projects. Similarly, the "D. mel Proteins", "CDS Mapping", and "D. mel Transcripts" evidence tracks for the whole genome (BCM-HGSC) assemblies of nine Drosophila species have been updated to release 6.13.
6. Updates to curriculum materials
During Fall 2016, NCBI has removed the GenInfo Identifier (GI number) from the GenBank, GenPept, and fasta records, and from the BLAST output. We have updated the GEP curriculum materials to account for these changes. We have also made minor revisions to the curriculum materials so that they are compatible with the most recent version of the GEP UCSC Genome Browser, Gene Record Finder, and the database records at FlyBase, NCBI, and UniProt.
The following curriculum materials have undergone minor revisions for Spring 2017:
- An Introduction to NCBI BLAST
- Annotation Strategy Guide
- Annotation for D. virilis
- Annotation of Conserved Motifs in Drosophila
- Annotation of Drosophila (workshop presentation)
- Annotation of Drosophila Primer
- Annotation of Transcription Start Sites in Drosophila
- Annotation of a Drosophila Gene
- Behavior and Limitations of Motif Finding
- Chimp BAC Analysis
- Detecting and Interpreting Genetic Homology
- Genbank Accession Number Reference Sheet
- Introduction to ''ab initio'' and Evidence-based Gene Finding
- Introduction to web databases
- List of Common Bioinformatics Programs
- Motif discovery in Drosophila
- Multiple sequence alignments with Clustal Omega
- Quick check of student annotations
- Searching for Transcription Start Sites in Drosophila
- Simple Annotation Problem
- Using mRNA and EST Evidence in Annotation