Yuying Gosser City College, CUNY
- 1 Genomics at City College, CUNY
- 1.1 Course Overview
- 1.2 Implementation
- 1.3 Lessons Learned and Future Plans
- 1.4 Syllabus for Sci 280
- 1.4.1 I. Cell structure and Central Dogma (Reading: the introduction to elements of Biology at http://www.ebi.ac.uk/microarray/biology_intro.html)
- 1.4.2 II. Protein 3D structure prediction -- the basis of homology modeling
- 1.4.3 III. Protein Structure determination methods, and structure –function analysis
- 1.4.4 IV. Genome sequencing
- 1.4.5 V. Protein sequencing and Proteomics (Ref: http://www.expasy.ch/tools/) (Guest lecture by Dr. Clement)
- 1.4.6 VI. Introduction to probability and algorithms of Bioinformatics (Guest lecture by Dr. Brass)
- 1.4.7 VII. Overview of gene cloning and expression technology (ref. MIT open course ware)(Guest lecture by Dr. Sihong Wang)
- 1.4.8 VIII. Overview Microarray technology and genome-wide gene expression profiling
- 1.4.9 IX. Introduction to RNA structure, modeling, microRNA, siRNA and gene silencing, microRNA databases
- 1.4.10 X. Comparative genomics and Gene annotation
- 1.4.11 XI. Poster presentation of projects
Genomics at City College, CUNY
The Genomics education program at CCNY is implemented through the existing course Sci280 “Bioinformatics and Biomolecular Systems”, which was developed by the collaboration of the faculty from Chemistry, Biology and Computer Science with the HHMI science education grant to CCNY. This course is a computer-laboratory-based interdisciplinary course, which introduces basics of genome science and genetic materials (DNA, RNA and protein), and comparative genomics.
Students will learn to use bioinformatics databases, computational tools, and literature database to complete projects in gene annotation, siRNA design for silencing disease genes, structure analysis and modeling of important proteins in enzymatic degradation of polymer waste, tumor suppression and sensory receptors. This course provides basic knowledge/skill in bioinformatics and genome science to prepare students for early participation in research in bioengineering, molecular genetics, system biology, and molecular structural biology
From the summer 2008 to summer 2009, the course has been offered two terms to the college students (size 7-9) in the form "special interest group" of the Genomics & Bioinformatics (Fall 2008) and in the form of computer-based lab course (Spring 2009), and two summer workshops (6 weeks each summer, size 11-14) to the advanced high school students. The course starts with introduction of central dogma, then protein structure visualization and homology modeling, and function analysis. The students are required to investigate a disease related gene (from sequence, structure to function) and make a power point presentation. After that, the students will start the gene annotation project.
Since Fall 2009, this course has been offered to more than 100 students, about 30% are from Biomedical and Chemical Engineering, and more than 100 annotation projects have been completed with about 80% passing rate (by the end of 2012, 85 annotation projects had been submitted to GEP database at WUSTL.)
Lessons Learned and Future Plans
1. Since 2003, we have offered 12 bioinformatics workshops ( winter, 3 days, summer, 3-4-6 weeks), the attendees were composed of approximately 45% to 55% engineering students. This indicates that the engineering students are interested in the research topics in genome science. However, they have no time to complete the pre-requisites to take the research oriented genomics and bioinformatics course. We can conclude that expanding genomics education to engineering and non-biomajor students is to meet a demand and a challenge of current science education.
2. Because we are trying to offer the genomics education course to engineering students, and freshmen and sophomores, we found it is necessary to start from Central dogma, and introduce protein structure visualization and homology modeling. By finishing the first project on protein structure modeling, the students became familiar with many vocabulary of biology, and the tool BLAST, CLUSTALW, as well as literature reading. This prepared the students for taking on the next project on gene annotation.
3. Gene annotation project provided a focus to our bioinformatics course. Students, ranging from sophomore, to Freshman, and to High Schools, are exposed to a research field early in their education. This course prepared students in many aspects for further engaging in research, from literature reading to informatics technology, from rigorous checking of the coordinates to broad similarity searching and from summary of results to writing a complete research report.
4. The web-based teaching materials can be adapted to provide instruction to different levels of college students, from freshman to senior. Level of difficulties and amount of course materials can be adjusted accordingly. Therefore, the class can be given to a wide range of students from many different disciplines.
5. The students can really be the best teachers to each other for troubleshooting tips and efficient procedure advice.The TAs who were trained in the GEP workshop played key role in helping the students successfully complete an annotation project.
6. Although group work may be helpful for larger projects, individual work on one full contig or fosmid helps to have a more cohesive understanding of the whole project, from beginning to finish.
7. In explaining the annotation procedure, such as identifying the splicing signature using Genome browser, it is important not to skip steps the first time and be sure that everyone is following by getting frequent feedback at key junctures. 'Pictures are worth a thousand words'. Visuals emphasize and explain things much more quickly than longwinded sentences.
8. The paper experiment of DNA assembly is particular helpful. It helped students easily understand the sequence annsembly in the shotgun sequencing method. The questions for discussion illustrated probability and nX coverage.(However, the statement of those probability questions should be more rigorous mathematically, the provided answers are correct only if the G, C, A, T four bases are equally distributed in the given sequences).
Finally, the DNA sequence was translated to a protein sequence betaglobin, which is a subunit of the well studied hemoglobin, and there are many structures under the "homoglobin" keyword in the PDB databank. The students can further visualize the protein structure and read the primary reference paper to understand the structure-function relationship. We also use this example to further illustrate the concept of six translation frame.
Syllabus for Sci 280
I. Cell structure and Central Dogma (Reading: the introduction to elements of Biology at http://www.ebi.ac.uk/microarray/biology_intro.html)
1. Introduction of genbank (www.ncbi.nih.gov), protein sequence and DNA sequence BLAST: sequence similarity searching CLUSTALW => identifying conserved regions; 2. Introduction to protein structure data bank (www.pdb.org), structure visualization using Pymol 3. Analysis of structural character and function of the target proteins—Read the primary citation paper
II. Protein 3D structure prediction -- the basis of homology modeling
1) BLAST search of sequence similarity against NCBI nr database and pdb database 2) Use SWISS-Model server to run homology modeling 3) Download the pdb file generated by SWISS-Model, visualization and analysis using Pymol.
III. Protein Structure determination methods, and structure –function analysis
1) NMR study of protein structure and protein-ligand interaction, 2) X-ray Crystallography 3) Guest lecture by Dr. Ruth Stark, Macromolecular structure study using NMR
Project 1a: Structure visualization, analysis and modeling of important proteins (including enzymes in biocatalysis , and tumor suppressors, and sensory receptors, etc). Based on the primary reference papers and information in NCBI and PDB prepare a power point presentation.
IV. Genome sequencing
1. DNA sequence assembly paper experiment and discussion of questions on overlap probability, six translation frames, etc. 2. The shotgun sequencing strategy and sequence reconstruction algorithms 3. The Sanger method 4. The new sequencing methods, such as 454sequencing (optional)
V. Protein sequencing and Proteomics (Ref: http://www.expasy.ch/tools/) (Guest lecture by Dr. Clement)
a) Mass spectrometry based protein sequence identification b) Demonstration of using MASCOT software for identification of a peptide sequence. c) In class exercise
VI. Introduction to probability and algorithms of Bioinformatics (Guest lecture by Dr. Brass)
a) The shotgun sequencing strategy and sequence reconstruction algorithms b) The algorithms for Local and global sequence alignment, multiple sequence alignment, (BLAST and ClustalW) c) Hidden Markov Models as generative model for aligned families of proteins, profile-HMMs
VII. Overview of gene cloning and expression technology (ref. MIT open course ware)(Guest lecture by Dr. Sihong Wang)
a) Vector and restriction enzyme; b) Plasmid construction; c) PCR and Site direct mutagenesis protocol (optional)
VIII. Overview Microarray technology and genome-wide gene expression profiling
IX. Introduction to RNA structure, modeling, microRNA, siRNA and gene silencing, microRNA databases
Project 1b: design a siRNA sequence to silence a target gene (based on the mRNA sequence of the target gene model the mRNA structure, and design the siRNA sequence to complement the single strand region of the mRNA).
X. Comparative genomics and Gene annotation
1) Introduction to evidence based gene annotation (www.gep.wustl.edu)
2) Demonstration: Annotate a Drosophila virilis gene based on it’s orthorlog Drosophila. Milanogaster using various online software and databases (GENSCAN, Flybase.org, ENSEMBL, NCBI- Genbank, BLAST and Mapview, UCSC Genome Browser).
3) Stepwise tutorial of gene annotation – Key steps: i) Identify ortholog using flybase, ii) Identify the coordinates of each exon using GeneRecorder Finder and NCBI BLASTx against the original unmasked DNA sequence; iii) Checking the start and stop codon and splicing site features using UCSC Genome Browser mirror at WUSTL (http://gander.wustl.edu/); iv) Construct gene structure model, pass gene model checker and get the cDNA sequence (mRNA sequence); v) goto http://www.expasy.ch/tools/dna.html to translate the DNA sequence to protein sequence, run BLAST2 against the ortholog (d. milanogaster) protein sequence to see the similarity; run BLASTp against nr db or SWISS-Prot db, and then run clustalw to identify the conserved regions; vi) inferring the structure and function of the gene under annotation based on the sequence similarity; vii) checking synteny;
4) Project 2: Annotate a fragment of genome sequence, i.e a fosmid or a contig, which contains more than one gene. (All isoforms of each gene need to be checked and annotated)
Students will generate the gene model and prepare their gene annotation reports to be submitted for authentication at the GEP headquarter in Washington University at Saint Louis.
XI. Poster presentation of projects
- GEP in CCNY (Microsoft PowerPoint Document)
- Presentation by Mariam Meghdari (Microsoft PowerPoint Document) at CCNY that describes the annotation of contig9 from the D. erecta dot chromosome
- Presentation by Sara Providence (PDF) at Brooklyn Technical High School (summer academy 2009 at CCNY)
- Presentation by Jack Li (PDF) at Stuyvesant High School (summer academy 2009 at CCNY)