We have reset the GEP Web Framework for the 2015 Fall semester. The list of key changes are summarized below:
1. Annotation projects for Fall 2015
- 60 new annotation projects from the D. elegans Muller D element
- 11 annotation projects remaining from Spring 2015 (see below)
- 70 new TSS projects from the D. biarmipes Muller F element
- Revised TSS annotation protocol
- New "Reconciled Gene Models", "Consensus Errors", and "RNA PolII" evidence tracks on the GEP UCSC Genome Browser
- This webinar will be redundant to the material presented at the Alumni Workshops
- Link to Doodle Poll (The times listed are in CDT)
- Please sign up by 7pm CDT next Monday (August 31, 2015)
- 19 new sequence improvement projects from the D. elegans Muller D element
- 41 sequence improvement projects remaining from Spring 2015 (see below)
- New Project Management System interface for claiming and submitting TSS projects
- New Gene Model Checker features: infer stop codon coordinates; manipulate dot plot viewer
- Updated GEP web framework tools (e.g., Gene Model Checker, Gene Record Finder)
- Updated genome browsers for D. melanogaster and 9 other Drosophila species
- New curriculum on dynamic programming and motif finding
- Revised TSS annotation protocol
- Revisions based on FlyBase release 6.06 and NCBI BLAST+ 2.2.32
Below is a more detailed description of the changes we have made for Fall 2015:
1. Annotation projects for Fall 2015
We have created a new set of 60 annotation projects from the D. elegans Muller D element (Aug. 2015 (GEP/3L Control) assembly). There are also 11 annotation projects (9 from the D. biarmipes D element and 2 from the D. elegans F element) from Spring 2015 that only have a single submission. These latter projects have higher priority than the new set of D. elegans D element projects.
Similar to Spring 2015, the TSS section of the GEP Annotation Report is optional so you can submit a project without TSS annotations. However, if time permits, we would like to encourage your students to annotate the TSS after they have completed the annotation of the coding regions.
2. TSS projects for Fall 2015
The coding regions of all the D. biarmipes F element projects (Aug. 2013 (GEP/Dot) assembly) have been annotated by at least two groups of students independently and these submissions were reconciled last year. However, most of the genes in this region do not have any TSS annotations. Consequently, we have created a set of 70 projects from the D. biarmipes Muller F element that require TSS annotations. These will provide a good challenge for those students who want to do more – and more student help on this aspect of the project is needed!
You can access the reconciled gene models for the D. biarmipes F element (Aug. 2013 (GEP/Dot) assembly) through the "Reconciled Gene Models" track (under "Genes and Gene Prediction Tracks"). Consensus errors that interfered with the coding region annotations are shown in the "Consensus Errors" track (under "Mapping and Sequencing Tracks"). We have also incorporated the RNA Polymerase II (RNA PolII) ChIP-Seq data into the genome browsers for the D. biarmipes F and D element projects. Regions that show significant RNA PolII enrichment are shown in the "RNA PolII Peaks" track while the relative RNA PolII enrichment over input DNA is shown in the "RNA PolII Enrichment" track. (Both RNA PolII tracks are available under the "Expression and Regulation" section.)
Based on the feedback from the GEP Alumni Workshop, we have also made substantial revisions to the "Annotation of Transcription Start Sites in Drosophila" walkthrough and the TSS workflow. In particular, we have tried to clarify the rationale and the protocol for defining both the TSS position and the TSS search region. (See pages 12-13 of the TSS walkthrough and the second page of the TSS workflow for details.)
We have also revised the TSS report form based on feedback from the Alumni Workshops. In particular, the revised TSS report provides additional guidance on how to classify the type of D. melanogaster core promoter (i.e. peaked, intermediate, broad, insufficient evidence). In some cases, the reconciled gene models might need to be revised either because of misannotations or updates to the D. melanogaster reference gene models by FlyBase. The last part of the TSS report form (beginning on page 5) describes the procedure for submitting revised gene models and for documenting additional consensus errors.
We have also added a new interface to the Project Management System for claiming and submitting TSS projects. Please see the "Claiming Projects" and "Submit TSS Projects" documentations (available under "Help" -> "Documentations" -> "Web Framework") for additional information.
3. Doodle poll for scheduling a GEP webinar on TSS annotation
We have discussed many of the revisions to the TSS annotation protocol described above during the Alumni Workshops this summer. (The Alumni Workshop presentations are available on the GEP Private Wiki.)
If you were unable to attend the Alumni Workshops this summer but are interested in learning more about the TSS annotation protocol, we plan to host a webinar on TSS annotation in early September. The webinar will cover essentially the same material as the Alumni Workshop presentations on the TSS annotation workflow with a few minor updates on the new TSS projects.
If you are interested in attending the webinar, I have setup a Doodle Poll in order to determine the best times for the webinar. Please sign up for the times (in CDT) that would work best for you by 7pm CDT next Monday (08/31/2015).
4. Sequence improvement projects for Fall 2015
We have created a new set of 19 D. elegans D element projects in order to assess the quality of the D element scaffolds prior to creating the annotation projects. However, there are also 41 sequence improvement projects available from the D. biarmipes F and D elements and from the D. elegans F element that were created in 2013 and 2014. Most of these projects only have a single submission. These projects have higher priority than the new D. elegans D element projects.
As a reminder, the GEP no longer runs a central pipeline for processing sequencing reaction orders during the semester. Hence your students will need to run additional PCR and sequencing reactions locally in order to resolve low quality regions and gaps in the assembly during the semester. Alternatively, please add a "dataNeeded" tag to the regions that require additional sequencing data and design the corresponding sets of PCR primers prior to project submission.
5. GEP web framework updates
We have added two new features to the Gene Model Checker based on feature requests from GEP faculty members. The Gene Model Checker will now infer the stop codon coordinates based on the coding exon coordinates and the orientation of the gene if the "Stop Codon Coordinates" field is empty when you select the field. For cases where the stop codon is not located immediately adjacent to the last amino acid, you can simply replace the coordinates in the "Stop Codon Coordinates" text box with the desired coordinates.
We have also added a new "View dot plot in the Dot Plot Viewer" link to the "Dot Plot" section of the Gene Model Checker output. You can use the Dot Plot Viewer to manipulate the parameters (i.e. word size, neighborhood threshold, scoring matrix) used to create the dot plot in order to adjust the sensitivity and specificity of the dot plot.
6. Synchronize GEP annotation resources to FlyBase release 6.06
The databases for the Gene Record Finder, the Gene Model Checker, the Annotation Files Merger, the BLASTX protein alignment tracks and the SPALN transcript alignment tracks on the GEP UCSC Genome Browser, and the BLASTX reports in the annotation packages have all been updated to FlyBase release 6.06.
The D. melanogaster annotations on the GEP UCSC Genome Browser have been updated to release 6.06. We have also added the "FlyBase Exons" and "FlyBase CDS" tracks to the Genome Browser that show the placement of each transcribed exon and coding exon in the D. melanogaster assembly, respectively.
The "D. mel Proteins" track for the whole genome (BCM-HGSC) assemblies of 9 other Drosophila species have also been updated to release 6.06. We have also added a new "CDS Mapping" track, which shows the tblastn alignments of the D. melanogaster coding exons against each genome assembly.
7. New and updated curriculum materials
== New curriculum ==
- Annotation of Drosophila Primer
- Introduction to Dynamic Programming
- From Smith-Waterman to BLAST
- Behavior and Limitations of Motif Finding
The following curriculum materials have undergone major revisions because of changes in the annotation protocol, web databases, or web tools.
- Searching for Transcription Start Sites in Drosophila (presentation)
- Annotation of Transcription Start Sites in Drosophila
- TSS Workflow
- Motif discovery in Drosophila
- GEP Annotation Report
- GEP TSS Report
- Gene Model Checker User Guide
These curriculum materials have been updated to maintain compatibility with the most recent database records at FlyBase, NCBI, and UniProt. Most of the changes can be attributed to changes in the FlyBase gene names and exon identifiers. Some of the changes are caused by the new version of NCBI BLAST+ (version 2.2.32).
- An Introduction to NCBI BLAST
- Annotation of a Drosophila Gene
- Annotation of Conserved Motifs in Drosophila
- Annotation of D. virilis
- Annotation of Drosophila (workshop presentation)
- Annotation Strategy Guide
- Chimp BAC Analysis
- Detecting and Interpreting Genetic Homology
- Introduction to ab initio and Evidence-based Gene Finding
- Introduction to web databases
- List of Common Bioinformatics Programs
- Simple Annotation Problem
- Using mRNA and EST Evidence in Annotation