We have reset the GEP Web Framework for the 2015 Spring semester. The list of key changes are summarized below:
1. Annotation projects for Spring 2015
- Projects from the D. biarmipes Muller D element and the D. elegans Muller F element
- Optional annotation of transcription start sites (TSS)
- Webinar on TSS annotation
- Projects from the D. biarmipes F and D elements, and the D. elegans F element
- No pipeline for processing reaction orders during the Spring semester
- Collect reaction orders using the Project Management System so that they can be processed in Summer 2015
- Webinar on improving hybrid assemblies
- Migrated "TSS (Celniker)" annotations from release 5 to release 6
- DNase I hypersensitive sites (DHS) tracks for multiple embryonic stages
- 16-state hiHMM Models for late embryos and 3rd instar larvae
- Additional evidence tracks from modENCODE and FlyBase
- Eight Drosophila species sequenced by modENCODE (April 2013 assembly) and D. suzukii (September 2013 assembly)
- Evidence tracks with similarity to D. melanogaster proteins and RNA-Seq data for each species
6. Updated curriculum materials
- Revisions based on FlyBase release 6.03 and NCBI BLAST+ 2.2.30
Below is a more detailed description of the changes we have made for Spring 2015:
1. Annotation projects for Spring 2015
For annotation, we will continue to work on projects from the D. biarmipes Muller D element and begin the annotation of the D. elegans Muller F element. Because all the D. biarmipes projects have been claimed at least 3 times, I have added a new set of 65 D. elegans Muller F element annotation projects to the Project Management System. However, the D. biarmipes projects (particularly those projects with zero submissions) have higher priority than the D. elegans projects.
Similar to Fall 2014, the transcription start site (TSS) annotations section of the GEP Annotation Report is optional so you can submit a project without TSS annotations. I have previously presented a webinar on the TSS annotation protocol in September 2014 and the webinar recordings and the PowerPoint presentation are available on the GEP Private wiki.
As a reminder, Dr. Elgin and Chris have developed a new protocol (Quick Check of Student Annotations) to help you verify the gene models submitted by your students. We would like to ask you to check the student annotations using this protocol prior to submission. This protocol is available at the GEP Specific Issues section of the GEP web site.
2. Sequence improvement projects for Spring 2015
For sequence improvement, we will continue to work on the set of projects from the D. biarmipes Muller F and D elements from last year as well as a new set of 26 projects from the D. elegans Muller F element. Chris has previously presented a webinar on how to identify problem regions and to improve the hybrid assemblies. The webinar recording and the PowerPoint presentation are available on the GEP private wiki.
Please note that, unlike previous years, we are no longer running a central pipeline for generating additional sequencing data during the Spring semester. Hence your students will need to run the additional PCR and sequencing reactions locally to resolve low quality regions or gaps in the assembly during the semester. Alternatively, please add a "dataNeeded" tag to these regions and design the corresponding sets of PCR primers so that we can run the sequencing reactions during Summer 2015. Your students can use the Project Management System to extract the oligo information and order PCR reactions. Please see the "Order Reactions" document on the Documentations section of the GEP web site for details.
3. Additional evidence tracks on the D. melanogaster GEP UCSC Genome Browser
The TSS annotation protocol we have used during Fall 2014 remains the same for Spring 2015. However, after consulting with Dr. Susan Celniker (principal investigator of the modENCODE transcriptome group), we have determined that the "TSS (embryonic)" evidence track on FlyBase is based on the results of an older and incomplete dataset. Consequently, we have lifted the most recent D. melanogaster TSS annotations from the release 5 assembly to the release 6 assembly. You can access these annotations through the "TSS (Celniker)" evidence track (under the "Expression and Regulation" section) on the GEP UCSC Genome Browser (D. melanogaster, July 2014 (BDGP R6) assembly).
In addition, the modENCODE transcriptome group is in the process of re-analyzing the transcriptome data relative to the D. melanogaster release 6 assembly. Once the modENCODE transcriptome group published the new TSS annotations for the release 6 assembly, we will evaluate the new TSS annotations in order to ascertain whether we need to update the TSS annotation tracks on the GEP UCSC Genome Browser. If there are substantial changes to the TSS annotations in the new release, we may need to deploy a breaking change during the Spring semester.
We have also added more evidence tracks to the D. melanogaster GEP UCSC Genome Browser to assist in the TSS classification (i.e. peaked versus broad TSS). The use of these evidence tracks is optional and they are not part of the TSS annotation protocol.
Some of these evidence tracks are particularly useful when a gene is not expressed in the BG3, S2, or the Kc167 cell lines. For example, you can evaluate additional DHS datasets for five different embryonic stages through the "Detected DHS Positions (Embryos)" and the "DHS Read Density (Embryos)" evidence tracks (under the "Expression and Regulation" section). Similarly, the modENCODE project has produced a 16-state chromatin model for late embryos and 3rd instar larvae ("hiHMM Models" under the "Chromatin Domains" section). Similar to the 9-state model, the warmer colors (e.g. red, orange) correspond to the promoter and enhancer regions. These tracks are useful if the gene of interests is not expressed in BG3 and S2 cells.
We have also incorporated additional modENCODE ChIP-Seq data sets for RNA-Pol II, CREB-binding protein, and histone modifications for multiple developmental stages. Note that the "modENCODE ChIP-Seq" track (under the "Histone Modifications" section) is a composite track. You can click on the "modENCODE ChIP-Seq" link to select a subset of the histone modifications or developmental stages.
4. New genome browsers for the whole genome assemblies of nine Drosophila species
While version 2 of the whole genome assemblies for the eight Drosophila genomes recently sequenced by the modENCODE project are available through GenBank, FlyBase has not yet constructed the genome browsers for these species. Consequently, to facilitate comparative analysis (e.g. use phylogenetic footprinting to identify the initial transcribed exon), we have constructed basic genome browsers for these eight species as well as D. suzukii (which is closely related to D. biarmipes).
The genome browsers for the eight species sequenced by the modENCODE project are available under the "April 2013 (BCM-HGSC)" assembly. The D. suzukii genome browser is available under the "September 2013 (BGI)" assembly. The name of each scaffold corresponds to its GenBank accession number, in concordance with the new naming convention used by the GenBank FTP site.
These genome browsers include evidence tracks that show sequence similarity to D. melanogaster proteins, RNA-Seq data from adult females, adult males, and mixed embryos (i.e. RNA-Seq read coverage, TopHat splice junctions, Cufflinks transcripts, TransDecoder gene models), and RepeatMasker results using species-specific transposon libraries.
5. Synchronize GEP annotation resources to FlyBase release 6.03
The databases for the Gene Record Finder, the Gene Model Checker, the Annotation Files Merger, the BLASTX protein alignment track on the Genome Browser, and the BLASTX report in the annotation packages have all been updated to FlyBase release 6.03.
The annotations for D. melanogaster on the GEP UCSC Genome Browser have been updated to release 6.03. The "D. mel Proteins" track for the whole genome assemblies of the other Drosophila species have also been updated to release 6.03.
6. Updated curriculum materials
We have updated many of the annotation curriculum materials to maintain compatibility with the most recent database records at FlyBase, NCBI, and UniProt. Note that some of the (minor) changes are caused by the new version of NCBI BLAST+ (version 2.2.30).
- An Introduction to NCBI BLAST
- Annotation Instruction Sheet
- Annotation of a Drosophila Gene
- Annotation of Conserved Motifs in Drosophila
- Annotation of Drosophila (GEP workshop presentation)
- Annotation of Transcription Start Sites in Drosophila
- Annotation Strategy Guide
- Basics of BLAST
- Chimp BAC Analysis: Genes and Pseudogene
- Detecting and Interpreting Genetic Homology
- Introduction to ab initio and Evidence-based Gene Finding
- List of Common Bioinformatics Programs
- Motif Discovery in Drosophila
- Searching for Transcription Start Sites in Drosophila
- Simple Annotation Problem
- Using mRNA and EST Evidence in Annotation
- Sequence Improvement Protocol for GEP Hybrid Assembly Projects
- Strategies for Finishing Hybrid Assemblies