If you have worked on annotation projects this past Fall, please submit the projects as soon as possible as this would allow us to minimize the amount of duplicated student efforts. Please note that you can amend your submission later on during the Spring semester by re-submitting the project through the Project Management System.
I have also setup a new set of 75 projects from the D. biarmipes Muller D element. These projects are available on the GEP UCSC Genome Browser and the GEP Data Repository. These projects will be available to be claimed once a larger fraction of the current set of annotation projects has been submitted.
During the Fall of 2013, we have hosted two webinars that discuss the protocol for identifying and documenting potential consensus errors in the D. biarmipes projects. The slides and webinar recordings are posted on the September 2013 GEP Webinar Presentation page on the GEP private wiki. You can also access the webinar recording directly at https://wustl.adobeconnect.com/p20cvyun6ug/.
As Chris mentioned in a previous message, we are in the process of developing the sequence improvement curriculum for D. biarmipes. Chris will schedule a webinar later this month to discuss changes to the sequence improvement protocol and the new curriculum materials he has developed.
I have posted 46 new sequence improvement projects from the D. biarmipes Muller F and D elements. You can claim these projects through the Project Management System and you can also download the project packages through the GEP Data Repository.
Summary of key changes
- New annotation projects from the D. biarmipes Muller D element
- New protocol for identifying and documenting potential consensus errors in the D. biarmipes assembly
- New sequence improvement projects from the D. biarmipes Muller F and D elements
- Updated curriculum materials
- GEP Web Framework updates
1. New D. biarmipes Muller D element projects
As I have mentioned in a previous message, the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) has released a new D. biarmipes assembly in Spring 2013, which corrected many consensus errors (particularly within mononucleotide runs). We have applied an additional analysis pipeline to identify and fix additional consensus errors (~300 genes in the entire assembly) prior to creating the GEP annotation project packages.
You can access the D. biarmipes Muller D element projects through the GEP UCSC Genome Browser. You can view the projects using the GEP UCSC Genome Browser by selecting "D. biarmipes" under the "genome" field and "Jan. 2014 (GEP/3L Control)" under the "assembly" field on the Genome Gateway page. Depending on the completion status of the current set of projects, this new set of Muller D element projects may become available to be claimed on the Project Management System later this semester.
2. New protocol for identifying and documenting potential consensus errors
As a reminder, because of the high rate of consensus errors in the D. biarmipes projects, we have developed a new protocol to identify and document potential errors in the consensus sequence. Below is a brief summary of the annotation protocol for gene models with consensus errors. Please refer to the webinar I have described above for additional information.
Changes to the consensus sequence will be documented using the Variant Call Format (VCF) that was originally developed by the 1000 genomes project. We have developed a new tool called the Sequence Updater (available under "Projects" -> "Annotation Resources") to help students create these VCF files. We have also updated the Gene Model Checker, Annotation Files Merger, and the Project Management System to support the new VCF file generated by the Sequence Updater. For example, you can provide the Gene Model Checker with a VCF file to validate a gene model with consensus errors.
In addition, the GEP Annotation Report form has been revised and it now includes instructions on how to document regions with potential consensus errors. The Project Management System has been updated so that it can accept an optional VCF file as part of the annotation project submission.
Please refer to the Sequence Updater User Guide for more information (available under "Help" -> "Documentations" -> "Web Framework"). The user guide illustrates how you can use the GEP UCSC Genome Browser to identify regions with potential consensus sequence errors, document these errors using the Sequence Updater and verify a gene model with consensus sequence errors using the Gene Model Checker. Documentations for other GEP Web Framework tools have also been updated with descriptions on how you can use the VCF file with each tool.
3. Synchronize annotation resources to FlyBase release 5.54
The BLASTX report in the annotation packages, databases for the Gene Record Finder and Gene Model Checker, Annotation Files Merger and the BLASTX protein alignment track on the Genome Browser have all been updated to FlyBase release 5.54.
Because some faculty members are using projects from the sandbox Project Management System for training, all the annotation packages and genome browsers (including the deprecated Aug. 2012 D. biarmipes assembly) have been updated to release 5.54 irrespective of whether they are available for claiming in the production Project Management System.
4. Updated curriculum materials
We have also updated most of the annotation curriculum materials to maintain compatibility with the most recent records in the various public databases (e.g. NCBI, FlyBase, and UniProt). Note that some of the exon identifiers for FlyBase (e.g. used by the Gene Record Finder) have changed in release 5.54 compared to release 5.52.
The following documents have been revised for Spring 2014:
- Annotation of Drosophila
- Annotation Strategy Guide
- Chimp BAC analysis
- Common Bioinformatics Programs
- Detecting and Interpreting Genetic Homology
- Introduction to ab initio and evidence-based gene finding
- Introduction to NCBI BLAST
- Searching for Transcription Start Sites in Drosophila
- Simple Annotation Problem
5. Project Management System update
During Spring 2014, GEP students will improve projects that are derived from the D. biarmipes Muller F and Muller D elements. Because there are no fosmid templates available, students must order PCR reactions to close gaps and low quality regions. We have updated the Project Management System user interface so that students can design PCR primer pairs and review the status of the PCR reactions. Please refer to the Order Reactions user guide (available under "Help" -> "Documentations" -> "Web Framework") for details on how to use the new interface.