The GEP web framework has been reset for the 2013 fall semester.
Summary of key changes
- New annotation projects from the D. biarmipes Muller F element and the D. ananassae Muller D element
- New protocol for identifying and documenting potential consensus errors in the D. biarmipes assembly
- Synchronize annotation resources to FlyBase release 5.52
- New and updated curriculum materials
- GEP Web Framework updates
1. New D. biarmipes and D. ananassae projects
The Aug. 2012 D. biarmipes Muller F element projects were derived from the first D. biarmipes genome assembly produced by the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC). This original D. biarmipes assembly is based on only 454 reads and has not been manually improved. Because of known weaknesses of the 454 sequencing technology in resolving the correct number of bases in long mononucleotide runs, the consensus sequence of the original assembly contains many errors. Last year, GEP students have identified many potential consensus errors in the D. biarmipes projects that either result in frame shifts or remove optimal splice site candidates (due to incompatible splice donor and acceptor phases).
BCM-HGSC has released a new D. biarmipes assembly earlier this year which uses Illumina genomic reads to identify and fix many consensus errors in the original 454 assembly. We have applied an additional analysis pipeline to identify and fix additional consensus errors in the new assembly. Given the large number of differences between the original and the new assembly (and their potential impact on the gene models), we will re-annotate the D. biarmipes Muller F element this fall using the revised assembly.
You can access the projects for the new D. biarmipes projects through the Project Management System and the GEP Data Repository. These projects have the project prefix “dbiarmipes_dot_Aug2013”. You can view the projects using the GEP UCSC Genome Browser by selecting “D. biarmipes” under the “genome” field and “Aug. 2013 (GEP/Dot)” under the “assembly” field on the Genome Gateway page.
In addition, there are 10 D. ananassae Muller D element projects from Spring 2013 that require additional work.
2. New protocol for identifying and documenting potential consensus errors
Because of the high rate of consensus errors in the D. biarmipes projects, we have developed a new protocol to identify and document potential errors in the consensus sequence. Changes to the consensus sequence will be documented using the Variant Call Format (VCF) that was originally developed by the 1000 genomes project.
We have developed a new tool called the Sequence Updater (available under “Projects” -> “Annotation Resources”) to help students create these VCF files. We have also updated the Gene Model Checker, Annotation Files Merger, and the Project Management System to support the new VCF file generated by the Sequence Updater. For example, you can provide the Gene Model Checker with a VCF file to validate a gene model with consensus errors.
In addition, the GEP Annotation Report form has been revised and it now includes instructions on how to document regions with potential consensus errors. The Project Management System has been updated so that it can accept an optional VCF file as part of the annotation project submission.
Please refer to the Sequence Updater User Guide for more information (available under “Help” -> “Documentations” -> “Web Framework”). The user guide illustrates how you can use the GEP UCSC Genome Browser to identify regions with potential consensus sequence errors, document these errors using the Sequence Updater and verify a gene model with consensus sequence errors using the Gene Model Checker. Documentations for other GEP Web Framework tools have also been updated with descriptions on how you can use the VCF file with each tool.
3. Synchronize annotation resources to FlyBase release 5.52
The BLASTX report in the annotation packages, databases for the Gene Record Finder and Gene Model Checker, Annotation Files Merger and the BLASTX protein alignment track on the Genome Browser have all been updated to FlyBase release 5.52.
Because some faculty members are using projects from the sandbox Project Management System for training, all the annotation packages and genome browsers (including the deprecated Aug. 2012 D. biarmipes assembly) have been updated to release 5.52 irrespective of whether they are available for claiming in the production Project Management System.
4. New and updated curriculum materials
=== New curriculum materials ===
Contributions from GEP partners:
- Sequencing Workshop (by Dr. Justin DiAngelo at Hofstra University)
- A Simple Annotation Exercise (by Dr. Justin DiAngelo at Hofstra University)
- Introduction to BLAST using Human Leptin (by Dr. Justin DiAngelo at Hofstra University and Dr. Alexis Nagengast at Widener University)
- An Introduction to Hidden Markov Models (developed by Dr. Anton E. Weisstein at Truman State University and Zane Goodwin (TA for Bio 4342))
- Searching for Transcription Start Sites in Drosophila
- Identifying Conserved Motifs in Drosophila
We have also updated most of the annotation curriculum materials to maintain compatibility with the most recent records in the various public databases (e.g. NCBI, FlyBase, and UniProt). Note that all exon identifiers for FlyBase (e.g. used by the Gene Record Finder) have changed in release 5.52 compared to release 5.48.
The following documents have been revised for Fall 2013:
- Annotation of Drosophila
- Annotation Strategy Guide
- Chimp BAC analysis
- Detecting and Interpreting Genetic Homology
- GEP Annotation Report
- Introduction to ab initio and evidence-based gene finding
- Introduction to NCBI BLAST
- Simple Annotation Problem
5. GEP Web Framework updates
- Sequence Updater
- Generate VCF files to document consensus errors in the project sequence
- GEP UCSC Genome Browser Mirror
- Improve accuracy and performance of the RNA-Seq Alignment Summary tracks
- Use Consed shallower depth functionality to reduce the number of RNA-Seq reads in the D. biarmipes RNA-Seq track
- Added support for VCF custom tracks
- Gene Model Checker
- Added VCF support to check gene models with consensus errors
- Constrain genome browser and alignment window to the viewport
- Annotation Files Merger
- Added ability to merge multiple VCF files
- View merged VCF file as a custom track on the GEP UCSC Genome Browser
- Project Management System
- Ability to submit only the project report for projects with no genes
- Include VCF file as part of the annotation project submission
- GEP Live CD
- Updated consed installation to Consed 25
- Create new VirtualBox virtual appliance to simplify the configuration process