Changes to the GEP Web Framework - Fall 2016

Change log for the Genomics Education Partnership Web Framework
Post Reply
wleung
Posts: 179
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Changes to the GEP Web Framework - Fall 2016

Post by wleung » Wed Aug 24, 2016 10:56 pm

Hello Everyone,

The GEP Web Framework has been reset for the 2016 Fall semester.

During the Alumni Workshops this summer, we have discussed the revisions to the TSS annotation protocols and updates to the GEP Web Framework. (The presentations and curriculum materials from the Alumni Workshops are available on the GEP Private Wiki.)

If you could not attend the Alumni Workshops this summer and would like to learn more about either the revised TSS annotation protocol or the updates to the GEP Web Framework, we plan to host two webinars in early September to cover these topics. Please indicate the times that would work best for you on the following Doodle polls:
We will decide the topics and the schedule for the webinar sessions based on the number of faculty members who have signed up by next Monday evening (8/29/2016, 7pm CDT).


The key changes to the GEP Web Framework are summarized below:

1. Annotation projects for Fall 2016
  • 13 D. elegans (D element) and 5 D. ficusphila (F element) projects remaining from Spring 2016 - highest priority!
  • 65 new annotation projects from the D. ficusphila D element
2. Transcription Start Sites (TSS) projects for Fall 2016
  • 69 D. biarmipes F element projects from Fall 2015
3. Sequence improvement projects for Fall 2016
  • 55 D. biarmipes, D. elegans, and D. ficusphila projects remaining from Spring 2016
  • Gaps within all the D. biarmipes and D. elegans projects, and within the D. ficusphila F element projects have all been examined during Summer 2016, and resolved if possible
  • 19 new sequence improvement projects from the D. ficusphila D element
4. Updates to the TSS annotation protocol
  • Revised TSS classification scheme for D. melanogaster core promoters
  • Revised definition of TSS search regions (narrow and wide TSS search regions)
  • New RAMPAGE and CAGEr evidence tracks for D. melanogaster
  • New strand-specific RNA-Seq evidence tracks for D. melanogaster
5. Synchronize GEP annotation resources to FlyBase release 6.12
  • Updated GEP web framework tools (e.g., Gene Model Checker, Gene Record Finder)
  • Updated FlyBase genes, exons, and CDS tracks for the D. melanogaster genome browser
  • Updated protein and transcript alignment tracks for GEP projects and for the Drosophila species sequenced by modENCODE (BCM-HGSC assemblies)
6. Updated curriculum materials
  • One-page summary of the GEP annotation workflow
  • Revised curriculum based on the new NCBI and UCSC Genome Browser user interfaces
  • Revised curriculum based on FlyBase release 6.12 and NCBI BLAST+ 2.5.0+

Below is a more detailed description of the changes that we have made for Fall 2016:

1. Annotation projects for Fall 2016
There are 13 D. elegans Muller D element projects and 5 D. ficusphila Muller F element projects remaining from Spring 2016. These projects have the highest priority in Fall 2016 (particularly contig2 from the D. elegans Muller D element, which has no submissions). If your students have previously completed annotation projects during Fall 2015 and Spring 2016 that have not yet been submitted, please submit these projects at your earliest convenience. Projects with at least two submissions are no longer listed on the project claim list after we reset the Project Management System. If you would like to submit work on any of these projects, please send me an Email with the list of missing projects and I will add them to your Project Management System account.

We have also created a new set of 65 annotation projects from the D. ficusphila Muller D element [Aug. 2016 (GEP/3L Control) assembly]. These projects were derived from three scaffolds near the base of the D element in version 2.0 of the D. ficusphila genome assembly.

Because these annotation projects are based on draft assemblies that have not been manually improved as yet, the project sequence might contain consensus errors that interfere with the annotations of the coding regions (e.g., causes frame shifts, incompatible splice sites). Instructions on how to identify and document consensus errors are available through the "Sequence Updater User Guide" (available under "Help" -> "Documentations" -> "Web Framework"). Please include the evidence used to support the putative consensus error in the "Consensus sequence errors report form" section of the GEP Annotation Report.

The TSS section of the GEP Annotation Report is optional so you can submit a project without TSS annotations. However, if time permits, we would like to encourage your students to annotate the TSS after they have completed the annotation of the coding regions.


2. TSS projects for Fall 2016
TSS annotations are an important part of the current GEP research project. Of the 70 TSS projects from the D. biarmipes F element [Aug. 2013 (GEP/Dot) assembly], 69 projects are still available to be claimed. We have revised the TSS annotation protocol based on feedback from the June and July Alumni Workshops in order to resolve ambiguities in the classifications of the core promoter and in the definition of the TSS search regions (see item #4 below for details).

We would like to encourage you and your students to contribute to the TSS annotation projects. These TSS annotations are essential to the phylogenetic footprinting analyses (using the program Magma) we have planned to identify conserved motifs surrounding the core promoters of F element genes. We plan to begin this aspect of the analysis in Spring 2017 so that we can determine if we need to annotate the F and D elements of an additional Drosophila species in order to increase the statistical power of the phylogenetic footprinting analysis.


3. Sequence improvement projects for Fall 2016
During this past summer, three Washington University undergraduate students worked on sequence improvement of the D. biarmipes and D. elegans F and D elements and the D. ficusphila F element. Their sequence improvement efforts were primarily focused on resolving gaps via genomic PCR and sequencing.

There are currently 10 D. biarmipes projects, 29 D. elegans projects, and 16 D. ficusphila projects remaining from Spring 2016. As of August 2016, all of the D. biarmipes and D. elegans F and D element projects, as well as the D. ficusphila F element projects that contain gaps in the initial assembly have either been resolved or have been tagged as "doNotFinish". Gaps are tagged as "doNotFinish" when multiple sets of primers and multiple attempts of PCR and sequencing have failed. Consequently, the primary focus for all of the D. biarmipes and D. elegans sequence improvement projects and for the D. ficusphila F element projects would be to resolve errors within mononucleotide runs.

We have also created a new set of 19 D. ficusphila sequence improvement projects in order to assess the quality of the D element scaffolds prior to creating the annotation projects. These D. ficusphila D element projects were derived from three genomic scaffolds in version 2.0 of the D. ficusphila genome assembly. These projects have the prefix "DFIC7313", "DFIC7408", and "DFIC7314" in their project names. Gaps within these projects still need to be resolved (e.g., by force join or by additional PCR and sequencing). These projects have lower priority than the D. biarmipes and D. elegans projects. However, if you want a wet-lab component to your course, please work on these projects.


4. Updates to the TSS annotation protocol
The TSS annotation protocol has been revised based on feedback from the June and July GEP Alumni Workshops. The revised protocol provides a more precise definition for classifying the type of core promoter (i.e. peaked, intermediate, broad) based on the number of TSS (Celniker) sites and the number of DHS sites within a 300bp window. The revised protocol also introduces the concept of a "narrow" and a "wide" TSS search region so that the labels better reflect the strength of the evidence that supports each TSS search region.

We have also incorporated RAMPAGE and CAGEr analysis results into the GEP UCSC Genome Browser for the D. melanogaster dm6 assembly. These experimental techniques provide single base resolution of the TSS locations, and they have previously been used by the ENCODE and the modENCODE projects. The values in the read density tracks are correlated with the strength of the TSS.

The RAMPAGE results from 36 developmental stages were lifted from the D. melanogaster release 5 assembly to the release 6 assembly. The modENCODE CAGE datasets from 37 samples were mapped against the release 6 assembly and then analyzed by CAGEr. The RAMPAGE and CAGEr evidence tracks are available under the "Expression and Regulation" section of the D. melanogaster genome browser. To facilitate the interpretation of these large datasets, we have also constructed evidence tracks that combine the results from all samples [see the "Combined modENCODE CAGE TSS" and "Combined RAMPAGE TSS (R5)" tracks, respectively].

We have incorporated the revised TSS annotation protocol and the new evidence tracks into the GEP TSS curriculum. For example, the "Annotation of Transcription Start Sites in Drosophila" walkthrough uses the TSS annotation of onecut to illustrate how the RAMPAGE datasets could be used in the classification of the core promoter. This walkthrough also used the core promoter of Eph to illustrate the evidence that could be used to define the narrow and wide TSS search regions.

The modENCODE project has recently produced strand-specific RNA-Seq data for 42 D. melanogaster samples that cover 34 different developmental stages. We have lifted these results from the D. melanogaster release 5 assembly to the release 6 assembly and incorporated the results into the D. melanogaster genome browser. These datasets are available under the "RNA Seq Tracks" section. The "Combined modENCODE RNA-Seq (Development) (R5)" track shows the total read density from all samples while the "modENCODE RNA-Seq (Development) (R5)" track shows the read density for each sample. These evidence tracks are particularly useful in cases where the untranslated regions of adjacent genes overlap with each other but the genes are in opposite orientations.


5. Synchronize GEP annotation resources to FlyBase release 6.12
The databases for the Gene Record Finder, the Gene Model Checker, the Annotation Files Merger, and the blastx reports in the annotation packages have been updated to FlyBase release 6.12. We have also updated the protein alignments and gene prediction tracks (i.e. blastx protein alignments, SPALN transcript alignments, and genBlastG gene predictions) on the GEP UCSC Genome Browser for the D. biarmipes, D. elegans, and D. ficusphila projects.

Similarly, the "D. mel Proteins" and "CDS Mapping" tracks for the whole genome (BCM-HGSC) assemblies of nine Drosophila species have been updated to release 6.12. We have also added a new "D. mel Transcripts" evidence track that shows the BLAT alignments of D. melanogaster transcripts against each of these assemblies. The thicker boxes within this track correspond to the alignments to the coding regions of the transcript while the thinner boxes correspond to the alignments to the untranslated regions.


6. Updated curriculum materials
Both NCBI and the official UCSC Genome Browser have made substantial changes to their web interfaces this summer. We have updated the GEP curriculum materials to account for these interface changes and to account for changes to the underlying database records. Below are the lists of new and revised curriculum:

== New curriculum ==
  • Annotation Workflows
    • One-page summaries of the overall GEP annotation protocol, the decision tree for identifying D. melanogaster orthologs, and the steps for identifying splice sites
  • Multiple sequence alignments with Clustal Omega
    • Developed by Yu He (TA for Bio 4342), this presentation uses Clustal Omega to illustrate the key concepts behind multiple sequence alignments
== Curriculum materials with major revisions ==

The following curriculum materials (found under the "Beyond Annotation" and the "Specific Issues in GEP Annotation Projects" sections) have undergone major revisions because of changes to the GEP TSS annotation protocol:
  • Searching for Transcription Start Sites in Drosophila
  • Annotation of Transcription Start Sites in Drosophila
  • TSS Workflow
  • GEP TSS Report
  • GEP Annotation Report
== Curriculum materials with minor revisions ==

These curriculum materials have undergone minor revisions in order to maintain compatibility with the most recent version of the GEP UCSC Genome Browser, Gene Record Finder, and the database records at FlyBase, NCBI, and UniProt. Most of the changes can be attributed to changes in the FlyBase gene names and exon identifiers. Some of the revisions are caused by changes to the user interfaces of NCBI BLAST and the UCSC Genome Browser.
  • Understanding Eukaryotic Genes (Modules 1-6)
  • An Introduction to NCBI BLAST
  • Annotation Strategy Guide
  • Annotation of Conserved Motifs in Drosophila
  • Annotation of D. virilis
  • Annotation of Drosophila (workshop presentation)
  • Annotation of Drosophila Primer (workshop presentation)
  • Annotation of a Drosophila Gene
  • Basics of BLAST
  • Behavior and Limitations of Motif Finding
  • Chimp BAC Analysis
  • Detecting and Interpreting Genetic Homology
  • Introduction to ab initio and Evidence-based Gene Finding
  • Introduction to Web Databases
  • List of Common Bioinformatics Programs
  • Motif discovery in Drosophila
  • Primer on Reading Frame and Phase
  • Quick Check of Student Annotations
  • RNA-Seq Primer
  • Simple Annotation Problem
  • Using mRNA and EST Evidence in Annotation

Post Reply