Anya Goodman and Ed Himelblau, California Polytechnic State University

From GEP Wiki
Jump to: navigation, search

Anya Goodman - California Polytechnic State University


At California Polytechnic State University, I teach a “Bioinformatics Applications” course during a 10-week quarter. It is a 4-unit senior level course – 3 hours per week lecture and 1 three-hour lab. Originally I had 24-30 students in lecture and two lab sections (12-16 students each). Currently, most recent implementations had all students 20-24 in one lecture/lab section.  The lecture was a broad survey of different topics in genomics, the lab focused on genome annotation projects.  We spent 3.5 lab periods on practice exercises, 4.5 lab periods working on individual projects, and did student presentations during the last lab meeting. 

CGAT database: In 2012-2013, I worked with a Computer Science student to build a database that could facilitate student training (immediate feedback, choice of annotation assignments is guided by student experience and success on previous assignments; students could earn "badges" or gain experience points) and test feasibility of community based annotation (can students generate accurate annotations based on community consensus?). The description of  Community Genome Annotation and Training (CGAT) database is posted here. The prototype is currently housed on the iPlant server, I will post updated location shortly.

Lessons learned

  1. The annotation projects were great for the student population with diverse backgrounds (mixture of biology/biochemistry majors with computer/statistics/engineering majors). Each student could work at his/her own pace and take the project as far as they were able.
  2. Working on annotation in small groups worked well: each team of 3-4 students completed annotation of 2 contigs in a 10 week quarter. To ensure individual accountability, each student completed an individual project (personal wiki page presenting in-depth research on a gene they annotated) and each group member evaluated efforts and contribution of all group members.  We used a course wiki to facilitate communication within each group (file exchange, students checking each other's models and helping trouble shoot)
  3. Students working individually (one student -one contig) rather than in small groups had the advantage of individual accountability and project ownership, but also carried a several problems: 1. Unequal workload - some contigs are easier than others; many students did not complete their projects 2.Relatively large class size (24 students) made it is difficult to supervise a large number of projects. 3.  It was difficult to promote group cohesion, cooperation, and learning from peers.

Using course wiki to manage group projects.

I used public wiki ( to facilitate communication within each group and between me and each group.

e-mail me if you want to view our pages, and I will send you a guest login and password (the site is not open to protect student confidentiality). 

Future Plans

In the future, I hope to address difficulties of larger class and diverse student backgrounds by working with interdisciplinary teams of students and assigning team projects.

CHEM441 - BIOINFORMATICS APPLICATIONS - collaboration with CSC441 Bioinformatics Algorithms

Lecture: Tue, Thur 12:10-1:30 14-301 (exceptions 3/29 in 53-201)
Laboratory: Tue, Thur 1:40-3:00 14-301
Instructor Contact Information Office Hours
Anya Goodman Dept. of Chemistry and Biochemistry
Office: 25-222 Tel: 756-1666
T 3-4,
W 11-1,
F 11-1.


From catalog:

Introduction to new problems in molecular biology and current computer applications for genetic database analyses. Use of software for: nucleic acid, genome and protein sequence analysis; genetic databases, database tools; industrial applications in bioinformatics; ethical and societal concerns. 3 lectures, 1 laboratory. Prerequisite: One course in college biology (BIO 111 or BIO 161 recommended). Recommended: BIO 303, BIO 351 or CHEM 373.

What is this course really about?

Life sciences are undergoing a “genomic revolution,” driven by the union of molecular biology, robotics and computer sciences. Our ability to sequence genomes gave rise to genomics, a new field of study that requires intensive computational tools (bioinformatics). In this course I hope to

  • introduce you to thinking on a genomic scale,
  • demystify methods and concepts of genomics,
  • help you become proficient with computational tools for accessing genomic resources,
  • offer you an opportunity to work on interdisciplinary teams with computer scientists to conduct original research in genomics.


By the end of the quarter, students should be able to

  1. Define, explain and use appropriately terminology and concepts related to genomics, central dogma of molecular biology and molecular evolution.
  2. Describe novel technologies used for acquisition of genomic data and their applications.
  3. Chose appropriate web-based tools to find information about specific genes, proteins and genomes.
  4. Compare two or more nucleotide or protein sequences using BLAST and make inferences regarding molecular evolution and function.
  5. Analyze and synthesize information from various databases and BLAST experiments to predict novel gene structure and function.
  6. Describe the steps involved in software development process and the role of a client/researcher.
  7. Communicate clearly results of investigation in oral and written form.
  8. Effectively cooperate and communicate with colleagues in life sciences and computer sciences to accomplish a research task.
  9. Describe nature of science, scientific method and its application to genomic research.


Readings, homework handouts, and web site links will be posted on PolyLearn. I will indicate which files you will need to print and bring to class. You will still save money compared to buying a $100 textbook that becomes outdated before it is printed.


Wk Date Lecture Topic Laboratory(L) HW/DUE/assessment
1 Mar 27 Course overview, CS introduction L0. CS unplugged activity
L0 output/error log
Mar 29 Data and databases L1. Application: genetic testing, genotype-phenotype connection L1 answer sheet
2 Apr 3 HW1 Seq manipulation
Research Project: GEP
L2 discussion: %GC and codon bias
L2-1. %GC Program requirements (w/ CS) HW: GEP assessment

L2-1 Program requirements

Apr 5 L2-2 discussion
Genome Annotation – Tutorial 1
L2-1 analysis of %GC
L2-2 Program requirements
L2-2 Program requirements
3 Apr 10 Genome annotation tutorial 1 and 2
L3 discussion: repeat finding
L2-2 Codon bias: debugging and testing L2-1 analysis
Annotation Tutorial 1
Apr 12 Genome annotation tutorial 2 L2-2 Codon bias analysis
L3 Program requirements: repeats and palindromes
L3 Program requirements
4 Apr 17 Genome annotation – practice contig L3 testing, debugging
L2-2 analysis
Annotation Tutorial 2
Apr 19 Genome annotation – project assignment L3 testing, debugging
Practice Contig
5 Apr 24 MIDTERM L3 analysis
Apr 26 Sequence comparison
L4 discussion: global align
L3 analysis
6 May 1 Sequence comparison L4 Program requirements: global alignment L4 Program requirements
May 3 HW2 Seq. comparison
Protein resources
Protein resources (in class)
7 May 8 Genome sequencing: HGP L4 and Annotation L3 analysis
May 10 HW3 Genome sequencing
L5 discussion: gene predictor
L4 evaluation
Annotation, Science activity
L4 evaluation
8 May 15 Multiple seq alignments; 12 fly genomes paper L5 Program requirements: gene predictor L5 Program requirements
May 17 Phylogenetic trees Annotation Comparative genomics paper
9 May 22 HW4 Genome evolution Annotation HW4
May 24 Proteomics, metagenomics, other –omics survey L5 Gene predictor testing
L6 Clustering (?)
-omics assignment
10 May 29 Student presentations Quality control of annotation reports L5 evaluation
May 31 Student presentations Revisions Annotation reports
June 5 Final exam 1:10-4 FINAL EXAM

Color key: major assessment events, program requirements, lab analysis, HW/tutorials.


Your grade in this course will be determined based on the following criteria:

25 % CS joint projects: program requirements, genome analysis, gene predictor, paper
25 % Genome annotation: tutorials, progress reports, final annotation report
15 % Final exam
10 % Midterm.
10 % Homework, reflection and other assignments.
5% Final presentation
5 % Team work: peer evaluation of team members; instructor’s observations.
5 % Attendance, professional conduct.

Letter grades will be assigned following roughly 90%, 80%, 70% etc. cut offs for A, B, C etc. respectively.


This is a “learn-by-doing” course. Our emphasis is on doing research using bioinformatics tools and developing some of these tools. We will have a few formal lectures, them mainly serve to support the laboratory – prepare you for performing specific tasks and discuss what we learn in lab. We will use lecture time for lectures, practice, discussions of lab assignments (program requirements), presentation of lab results (analysis), discussion of our research questions, discussion of homework assignments, quizzes and midterm. The lab time will be used for interactions with CS students (to discuss the goals, divide up the tasks, obtain input data, test programs, obtain real data, analyze real data), gene annotation, and presentation/ discussion of lab results.


We will work on a research project in collaboration with Genomics Education Partnership (GEP, studying a recently sequenced genome. Our research is focused on comparative genomics and genome annotation.

Our research goals are:

  1. To compare genome structure of the new genome to other closely related Drosophila species and/or compare large regions of the same genome to each other (chr. 4 vs. chr.3 comparison).
  2. To find repeats, protein coding genes and non-coding functional elements in a “finished” genome sequence (annotation).

The lab portion of the course can divided in two parts, but we will work on the two parts concurrently:

I. Lab assignments carried out jointly with CSC448 students: Students will write program requirements, test the programs written by CS students and run the analysis to answer a biologically relevant question. Your program requirements will be evaluated by your CS colleagues. Analysis from labs 1-3 will be written up as a comparative genomics paper (due week 8); gene predictor (Labs 4 and 5) analysis will be presented by one of the team members for final presentation on May 31.

List of BIO-CS labs:

0. CS-unplugged: Marching orders

1. Applications of bioinformatics to human health

2. DNA Analysis: 2-1 %GC, 2-2 gene density and codon bias

3. Repeat Analysis: 3-1 simple repeat finding; 3-2 palindrome finding

4. Global protein alignment

5. Gene predictor

II. Genome annotation:

A. Practice annotation (use previously completed annotation projects, known answers)

B. Teams work on new annotation projects. Progress reports and short presentations will be done during lab/lecture.

C. Quality control: each team will check and submit another team’s projects.

Final product will be a set of 4 files for each contig:

  1. Annotation report file (word with screen shots)
  2. contigX.fasta (multiple sequence fasta file with predicted CDS of all isoforms for all genes)
  3. contigX.pep (plain text, multiple sequence fasta text file with predicted aa seq of all isoforms for all genes)
  4. contigX.gff (gff formatted description of all isoforms for all genes).


Many of the assignments in this class will involve team work. Ability to work on a team is an important professional and life skill. In addition, research shows that it can greatly enhance your learning experience. Please, talk to me if you have any concerns or encounter any problems, so we can figure out how to optimize your team’s performance and make this a good experience for you.


Midterm and the final exam allow for individual assessment of students’ mastery of key terms, concepts and procedures. Midterm will assess mastery of HW1 and annotation tutorials. Final exam will focus on HW2-4, in-class protein practice, writing program requirements, comparative genomics and annotation (our research project). The questions will be short answer, draw a diagram, fill in the blank.


The HW in this course consists of individual assignments that will be introduced in lecture. You will complete these outside of class and bring to class a hard copy of your answer sheet with your name on it. At the beginning of class on the due date, you will discuss your answers with your team mates and choose the best answer for each question on the answer sheets (circle on the hard copy). The HW will then be submitted as one stapled stack per team and discussed in class. There will be some in class assignments that will use similar format of discussion.


To help improve this course, I will ask you several times during the quarter to reflect on your experiences, assignments, and group dynamics. In addition, you will be asked to participate in several educational research assessments (pre- and post-course surveys and quizzes) to help improve instructional design and activities. Participation in these is voluntary.


For this presentation, demonstrate your skills with bioinformatics tools by

  • choosing a question (related to your annotation project and/or one of the lab projects),
  • investigating the question using the tools discussed in lecture,
  • and clearly presenting your conclusion with supporting evidence.


  1. description of evolutionary history of a gene or protein that you annotated,
  2. proposal of a novel function for a gene/protein on your contig,
  3. evaluation of “gene predictor” developed together with CS students.