Anne Rosenwald - Georgetown University

From GEP Wiki
Jump to: navigation, search

New Project: Genome Solver

Update 2012: As a result of efforts to expand the reach of GEP, I have been collaborating with several colleagues to develop tools to make use of the rich data now available from the Human Microbiome Project. My collaborators include

  • Gaurav Arora - a post-doctoral fellow in the Department of Biology at Georgetown
  • Ramana Madupu - J. Craig Venter Institute
  • Jennifer Roecklein-Canfield - Department of Chemistry, Simmons College (also a member of GEP)
  • Janet Russell - Center for New Design in Learning and Scholarship, Georgetown

With funding from the National Science Foundation, we have created a website for experts, faculty, and students to develop projects - please take a look at
In addition, we will be conducting small workshops for faculty interested in learning about microbial genomics and bioinformatics in a framework of sound pedagogy beginning in summer 2012. If you would like more information about the Genome Solver Workshops, please contact me at

Incorporating GEP at Georgetown University

At Georgetown, Dr. Anne Rosenwald (Biology Department) incorporated a D. erecta or D. mojavensis annotation project into a pre-existing Cell Biology course for juniors and seniors (20-25 students). Students worked in groups of 2 (occasionally 3) to annotate a fosmid. Each group wrote a paper and prepared a 10 minute powerpoint presentation for the rest of the class. We spent 5 weeks of the 13 week semester on this project. This project as a whole counted for 10% of their final grade.

Dr. Rosenwald also used some materials from WashU's Biol 3055 in Biochemistry - a 3 h lab was designed using the Cox2 information shown on the GEP web site. This class serves 100 students, Biology sophomores (50%) and other premeds - juniors, seniors, and post-baccalaureate students (50%). This introduction then served as the basis for a project the students did (in pairs) on an enzyme of their choice. The project counted for 15% of their final grade.

Cell Biology and GEP (2007-2010)

Time Devoted: We met for one 3-hour block per week for lab (in addition to 3 1-hour lectures). The first week was spent on the exercises provided through GEP to gain familiarity with the process, then the subsequent weeks were spent on the D. erecta or D. mojavensis fosmids.  Below is the complete syllabus for 2007.  

Outcomes: The students initially were uncomfortable with the notion that this exercise was open-ended and that there were not necessarily right or wrong answers, but answers with more or less justification. However, by the end of the block, they enjoyed the process and found it valuable because it gave them new insight into 'real research'. They also felt that they had gained some valuable tools. I found it valuable because it revealed some misunderstandings in their understanding of genes and proteins that were clarified as a result of this exercise.

Future: The students felt this was a valuable exercise and should be kept as part of the curriculum. I will do a better job of evening out the work load (split the red fosmids into two or three for example) so all groups have about the same number of genes to annotate. Most students taking this course next Fall will have had some exposure to some of these ideas already as a result of a lab and short project they did in Biochemistry, a prerequisite for Cell.

Biochemistry and GEP

Time Devoted: Students spent 1 3-hour lab period becoming familiar with BLAST, ClustalW, Expasy, PDB, etc. while they answered questions about Cox2. (See lab exercise below). This information introduced the students to the various web sites, which they then explored in the context of writing a report about an enzyme of interest. Below is the portion of the 2008 syllabus for this part of the course.  

Comments: The students enjoyed both the laboratory and the project. They were amazed that so much biochemical information is available on the internet. They felt that they had learned some valuable tools for the future.

Future: This will become a standard part of the Biochemistry lab curriculum.

Biology 363: Cell Biology Fall 2007 Syllabus

Welcome to Cell Biology! There are two major goals for the class. First, you’ll learn how your other biology classes fit into the study of Cell Biology and second, you’ll learn some important critical thinking/problem solving skills.

This course follows Genetics and Biochemistry and therefore, I will expect you to remember a great deal of this material. I’ll provide some review material for you, but if you’re having difficulties, please come see me.

Acknowledgement: As most of you know, Dr. Henderson used to teach this course. She generously shared all her notes with me, so some of what you’ll see during the semester was from her previous courses.

Class Meetings: Reiss 284 MWF 12:15 – 1:05 Laboratory: Reiss 401 Th 2:15 – 5:05 Class web site:

Instructor: Dr. Anne Rosenwald Office: 402 Reiss Phone: 7-5997 E-mail: Office Hours: By appointment Note: Like Biochemistry, Cell can be overwhelming if you focus too much on the myriad details without understanding how all the details fit together. If you are having problems with the material, please come see me to discuss how you can get more from the class.

Required Texts: Alberts et al., Molecular Biology of the Cell, 4th ed. (Garland) Wilson and Hunt, The Cell, A Problems Approach, 4th ed. (Garland) Note: make sure you get the 4th editions of these, not earlier versions

Another note: the textbook (but not the problems book) is available free online at this web site: (check out the other books available there, too – a very useful site).


Take-home exams:

Exam 1: Available 10/5 – due 10/12 15%

Exam 2: Available 11/2 – due 11/9 15%

Final Exam: Available 12/7 – due 12/14 or ASCB Paper# 15%

Problem Sets + Cell Surface Lab 15%

Class assignments including class participation and in-class work 10%

On-line Quizzes and Discussions 10%

Annotation Presentation and Paper 10%

Microarray Presentation and Paper 10%

Grading Scale:

92 – 100: A

89 – 92: A-

86 – 89: B+

83 – 86: B

80 – 83: B-

77 – 80: C+

74 – 77: C

70 – 74: C-

65 – 70: D

< 65: F

Attendance Policy: I will expect everyone to be in class each day unless you have an emergency. Note that participation in class discussions will help determine your grade for the semester. If you’re having issues, please come discuss with me.

Tutoring/Review Sessions: Tutoring/review sessions will be scheduled if there is sufficient interest from the members of the class. There will be special times set aside during exam weeks for people to come ask questions about the take-home exams as well.

Weather Emergencies If the University is closed for the day, we won’t have class (call 7-SNOW), but occasionally I may have to cancel class even if the University is open. In the event that happens, you will receive an email by 10 a.m. at the latest. Information will also be posted on Blackboard.

Honor System Policy Georgetown's Honor System outlines the Standards of Conduct you are expected to uphold as a member of the Georgetown Community (see the Undergraduate Bulletin for details). For this course, in addition to those standards listed in the Bulletin, the following will also apply:

1. Although you will be working in groups to gather data in the laboratory, all written work that is to be turned in for a grade must be your own. Names of your partners must be included in your write-up. Problem sets may be worked together, but the names of all contributors must be included. If it looks like you and your lab partner(s) have collaborated extensively on the writing (meaning you both turn in identical words to the same questions), this will be considered plagiarism and henceforth turned in to the Honor Council.

2. If you use information from the scientific literature or web sites in the preparation of any material to be turned in, even if in rough draft form, the information must be appropriately cited. Note, too, that use of citations without quotes means that you are citing someone else’s ideas, not their exact words. Anything else is plagiarism and will be turned in to the Honor Council.

3. Obviously, anyone caught cheating on exams or other class work will be turned over to the Honor Council, too.

As signatories to the Georgetown University Honor Pledge, and indeed simply as good scholars and citizens, you are required to uphold academic honesty in all aspects of this course. You are expected to be familiar with the letter and spirit of the Standards of Conduct outlined in the Georgetown Honor System and on the Honor Council website. As faculty, I too am obligated to uphold the Honor System, and will report all suspected cases of academic dishonesty.

Lecture Schedule

Lecture Schedule Lecture Date Topic 1 W 8/29 Introduction

2 F 8/31 Review Skim Chapters 1-8 to remind yourself about this material; powerpoints available on Blackboard with hints about what to review

-- (M 9/3 – No Class – Labor Day)

3 W 9/5 GUEST SPEAKER: Dr. George Chapman (GU – Biology)

4 F 9/7 Review

5 M 9/10 Membrane Structure (Chapter 10)

6 W 9/12 Lecture/Problems

7 F 9/14 Paper Discussion: Singer-Nicolson Model

8 M 9/17 Membrane Transport of Small Molecules (Chapter 11)

9 W 9/19 Introduction to Annotation Labs

10 F 9/21 Paper Discussion: Water Channels

11 M 9/24 Intracellular Compartments/Protein Sorting (Chapter 12)

12 W 9/26 Lecture/Problems

13 F 9/28 Paper Discussion: Mitochondrial Import

14 M 10/1 Intracellular Vesicular Traffic (Chapter 13)

15 W 10/3 Lecture/Problems

16 F 10/5 Paper Discussion: Golgi Maturation v. Stable Compartments

-- (M 10/8 – No Class – Mid-semester Holiday)

17 W 10/10 Cell Communication (Chapter 15)

18 F 10/12 Lecture/Problems

19 M 10/15 Cytoskeleton (Chapter 16)

20 W 10/17 Lecture/Problems

21 F 10/19 Paper Discussion: WASP

22 M 10/22 Cell Cycle/Programmed Cell Death (Chapter 17)

23 W 10/24 Lecture/Problems

24 F 10/26 GUEST SPEAKER: Dr. Kathryn Wilson (JHU Med School)

25 M 10/29 Mechanics of Cell Division (Chapter 18)

26 W 10/31 *Microarry Work in Lab*

27 F 11/2 *Microarry Work in Lab*

28 M 11/5 Cell Junctions, Cell Adhesion, ECM (Chapter 19)

29 W 11/7 Lecture/Problems

30 F 11/9 Paper Discussion: Apoptosis Control by Interaction with ECM

31 M 11/12 Germ Cells and Fertilization (Chapter 20)

32 W 11/14 Lecture/Problems

33 F 11/16 Development of Multicellular Organisms (Chapter 21)

34 M 11/19 Paper Discussion: Fragile X Protein and mRNA

(W 11/21 and F 11/23 – No Class – Happy Thanksgiving!)

35 M 11/26 Histology (Chapter 22)

36 W 11/28 Lecture/Problems

37 F 11/30 Cancer (Chapter 23)

38 M 12/3 Lecture Problems

39 W 12/5 GUEST SPEAKER: Dr. Malcolm Campbell (Davidson College)

40 F 12/7 Paper Discussion: Beta-catenin and Wilm’s tumor suppressor

NOTE: The American Society for Cell Biology Meetings are being held this year in Washington at the new Convention Center downtown December 1-5. I’m presenting a poster and my presentation time as of the start of classes was not yet finalized. We may have some rescheduling here. In addition, Dr. Campbell is in town because of the ASCB meetings, so his guest speaker slot on the 5th may also be rearranged.

Opportunity for you: You may replace the final exam with a paper about a talk you hear at the ASCB meetings. This paper will be based on the lecture(s) you hear, but you’ll need to do some extra research (i.e. read at least 5 journal articles and cite them in your paper) to write a thorough review of the topic.

Two possibilities:

1. Undergraduate Student Program A talk by Dr. Jennifer Lippincott-Schwartz (NIH) Seeing In The Dark: How Fluorescent Proteins Are Shaping Biology Saturday, December 1, 2007, 3:30-5:30 pm Washington Convention Center, Room 151 A/B

You can attend this talk and the Keynote Symposium later that evening (6 p.m.) for free.

Keynote Symposium: New Biologists for the New Biology • Dr. William Bialek, Princeton University • Dr. Shirley Ann Jackson, Rensselaer Polytechnic Institute

If you want to sign up for this, go to this web site: and scroll down to the bottom of the form. If a group wants to do this, then I’ll sign up for the group directly. You’ll have to pick up a special badge as you enter the Convention Center.

2. Undergraduate Registration for the Meeting This allows you into any portion of the entire meeting. Registration until Oct. 1 is $20; after that is $50. Look at the schedule of events and see if there is something in particular you’d like to learn about. If the fee is an issue, see me, and I’ll try to work something out, but try to do so before Oct. 1 (before the prices go up)!!

Laboratory Schedule

Date Topic

8/30 No lab

9/6 Microscopy: A Series of Field Trips

9/13 Investigating Cell Surfaces

9/20 Gene Annotation – Practice Problems/Your Assignment

9/27 Gene Annotation

10/4 Gene Annotation

10/11 Gene Annotation

10/18 Gene Annotation Presentations

10/25 MAs: Make cDNA – Introduction to MAGIC Tool

11/1 MAs: Incubate cDNA with chips

11/8 MAs: Analysis of your microarry with MAGIC Tool

11/15 MAs: More analysis

11/22 No lab – Happy Thanksgiving!

11/29 MAs: More analysis

12/6 Microarry Presentations

The microarray lab will require some extra lab time – we’ll discuss as a group, but the class periods before and after this lab day (W 10/31 and F 11/2) will be devoted to working in lab. In addition, it will be necessary to scan the chips on Friday afternoon. We’ll take a trip over to the Med Center to use their scanner.

Biology 151: Biochemistry Spring 2008 Syllabus

Note: rather than including the entire syllabus, I'm only including the information related to the Bioinformatics Lab and Projects the students performed.

The Lab was based on the Cox2 information from WashU's Biol 3055.

Bioinformatics Laboratory

Adapted from Biology 3055 Laboratory Manual from Washington University, St. Louis written by April Bednarski, and available from This exercise is part of the materials available through the Genomic Education Partnership (GEP).

Introduction Today’s lab will introduce you to some of bioinformatics tools that are widely used in biomedical research today. Over the past few years, the number of freely available software programs and web-based research tools has increased dramatically. Knowing how to use these tools is very important to research and health professionals in order to access and interpret the increasing amounts of genomic and proteomic information. These bioinformatics tools as well as genomic information are freely available on the web to anyone who knows how to access them.

This laboratory exercise will serve as a tutorial for the tools you’ll need to examine the enzyme you choose for your project (due at the end of the semester). More details about the project are found at the end of the lecture section of the Biochemistry Notebook. In today’s lab, as we explore some of these tools, your job will be to answer the questions posed by looking at information found at various web sites. These answers will be turned in next week for your lab report.

More about COX-2 (PTGS2) The enzyme we will focus on today has two names. It is called prostaglandin H2 synthase-2 (PTGS2) and cyclooxygenase-2 (COX-2). COX-2 has been thoroughly studied because of its role in prostaglandin synthesis. Prostaglandins have a number of different functions including promoting digestion and propagating pain and inflammation. Aspirin is a general inhibitor of prostaglandin synthesis and therefore, helps reduce pain. However, aspirin also inhibits the synthesis of prostaglandins that aid in digestion, making it a poor choice for pain management in those with ulcers or other digestive problems. Recent advances in targeting specific prostaglandin-synthesizing enzymes have lead to the development of Celebrex and Vioxx, which were marketed as an arthritis therapy. However, some of these have some unforeseen side-effects – heart problems – and so have subsequently been removed from the market.

Celebrex is a potent and specific inhibitor of COX-2; it doesn’t inhibit the related enzyme, COX-1, which is involved in synthesizing prostaglandins that aid in digestion.

Understanding the structures of COX-1 and COX-2 proteins helped researchers develop drugs that would only bind and inhibit COX-2. Many of the types of information and tools used by researchers for these types of studies are freely available on the web. In this tutorial, you will be introduced to the databases and freely available software programs that are commonly used by professionals in research and medicine to study genes, proteins, protein structure and function, and genetic disease.

A Few Notes and Introduction to Several of the Web Sites

Gene is a database of genes in which each entry contains a brief summary, the common gene symbol, information about the gene function, and links to websites, articles, and sequence information for that gene. GenBank is a historical database of gene sequences, which means it contains every sequence that was published, even if the same sequence was published more than once. Therefore, GenBank is considered a redundant database. RefSeq is a database of sequences that is edited by NCBI and is NON-redundant, meaning that it contains what NCBI determines is the strongest sequence data for each gene. PubMed is the source for all peer-reviewed literature dealing with biomedicine.

We will also learn how to use ClustalW, which is a multiple sequence alignment program. It allows you to enter a series of gene or protein sequences that may be similar and evolutionarily related. These sequences are usually obtained by performing a BLAST search, which we will also investigate today. ClustalW then aligns the sequences, so that the lowest number of gaps is introduced and the highest numbers of similar residues are aligned with each other. ClustalW uses a scoring matrix based on the scores or penalties given to a substitution of one amino acid for another or for introducing a gap.

The items below are a few pointers that will be helpful as you work through this tutorial and for your research project.

1. Many of the programs require entry of the sequence of interest in FASTA format.

The FASTA format has a title line for each sequence that begins with a “>” (a carat) followed by any needed text to name the sequence. The end of the title line is signified by a paragraph mark (hit the return key). Bioinformatics programs will know that the title line isn’t part of the sequence if you have it formatted correctly. The sequence itself does NOT have any returns, spaces, or formatting of any kind. The sequence is given in one-letter code. An example of a protein in correct FASTA format is shown below:


2. Also note that it’s often convenient to show sequence information in Courier font because all the letters are the same width. This is especially important for alignment views.

3. GenBank Entries look like this (picture deleted). Take a few minutes to examine the features of this example so you’ll be able to find the information you need.

If you go to and type in “L34209”, you should see the previous page. Note that if you scroll down, you’ll get the nucleotide sequence for the gene. However, note that this is not in FASTA format – see the numbers that start each row?

Nevertheless, it is possible to obtain the sequence in FASTA format from this page. Take a look at the next few pages for screenshots demonstrating this. Pull down the “Display” menu and one of the choices is FASTA.

Once you click here, the next page displayed should look like this: (picture deleted)

4. How do you get the predicted protein sequence for this gene? It turns out that this particular GenBank entry doesn’t have the information we want (it is describing the promoter sequences for this gene and doesn’t have the predicted amino acid sequence). Also, what else can be learned about the protein, including possible modifications, places for disulfide bonds, etc.?

Another place for protein information is SwissProt ( or UniProt ( These two sites are connected with each other and share information, so anything you find on one should also be found on the other. The next few figures show information you can gather from SwissProt. (pictures deleted)

5. How do you find homologs of your protein in other organisms? One convenient tool is BLAST, which stands for Basic Local Alignment Search Tool. This program looks for small stretches of sequence that are similar to each other. Higher numbers of matches and longer sequences result in higher scores, and lower E-values. E or expect values are a statistical measure of the likelihood that your match could be anything other than the correct match. An identical match results in an E-value of 0.0, so good matches are really small numbers here (e-100 for example). For BLAST, look at There are a number of ways to run this, which you can explore by clicking around on this page. For this exercise, you will mostly be interested in Blastp (p is for protein), which compares your protein sequence to other protein sequences in the databases like GenBank, although you may want to also use the feature that allows you to examine specific genomes for matches. This may take a few minutes, depending on how busy the server is.

Looking at related sequences can tell you something about the evolutionary history of your protein, but also can perhaps provide information about which amino acid residues might be important for function – you would expect such residues to be conserved.

6. To perform multiple sequence alignments of entire protein sequences, you would gather the protein sequences of interest from Blastp as above, save them in a Word file (important: in FASTA format – no spaces!), then load them into this submission form (all the sequences at once), then click “Run”. In a few minutes, you should have an output that gives you the alignment amino acid by amino acid, as well as a table showing you how alike the various sequences are to one another.

Tutorial: Comparison of PTGS1 (Cox-1) and PTGS2 (Cox-2)

Start a Word file so that you can copy and paste the information asked for in the questions below. You will turn this in next week.

Part 1

Follow these directions to access the entries for PTGS1 and PTGS2 in the “Gene” database at the NCBI Website: A. First, go to the NCBI homepage using the link on the lab webpage, or by going to: B. Select “Gene” from the database pulldown menu. Type “PTGS” in the search box, then click “Go.” C. Scan the results for the “Homo sapiens” entries. There should be one called “PTGS1” and one called “PTGS2.” D. Select each entry by clicking on its name, then read the paragraph under the “Summary” section for each entry.

Read the “Summary” section for both genes, then answer the questions below. 1. PTGS1 and PTGS2 are isozymes. Isozymes catalyze the same reaction, but are separate genes. What types of reactions to PTGS enzymes catalyze? Also, what pathway are these enzymes a part of? 2. How is the expression of PTGS1 and PTGS2 different? 3. Which isozyme would you want to inhibit to stop inflammation?

The next two questions are not discussed in the summaries, so do your best to answer them. Feel free to discuss with your classmates. (Hint – look at the introductory material for this lab for some ideas). 4. The drug Celebrex selectively inhibits PTGS2 while aspirin and other NSAID’s (non-steroidal anti-inflammatory drugs) inhibit both PTGS1 and PTGS2 in the same way. Why do you think researchers wanted to discover a selective inhibitor to PTGS2? 5. Describe how studying 3-D structures of PTGS1 and PTGS2 could help researchers design a drug that binds to PTGS1, but not to PTGS2.

Part 2

Getting sequence information and viewing database entries NCBI – Gene 1. Go back to the “Gene” entry for Homo sapiens PTGS2 and find the gene name. 2. What is the GeneID number? 3. Where in the human genome is this gene located? 4. What is the RefSeq accession number for the mRNA sequence of Homo sapiens prostaglandin-endoperoxide synthase 2? Open the entry, then choose “FASTA” from the pull-down menu. Copy the sequence (including the title line designated by the “>” symbol) and paste it into a word document. Select the “Replace” tool under the EDIT menu. In the “find” box, type “^p” to find all paragraph marks. Don’t type anything into the “replace” box. Then click “Replace All.” This will eliminate all the paragraph marks in the document. If you still see white spaces in the sequence, use the same procedure, but type “^w” in the “find” box to represent white spaces. Now add back a paragraph mark after the title line (that starts with “>”) and before the sequence starts. Save the file as PTGS2rna.doc on your desktop. 5. What is the RefSeq accession number for the Homo sapiens PTGS2 protein sequence? Open the entry. Follow the steps given above to save the sequence in FASTA format as a Word document called PTGS2prot.doc file on your desktop.

At this point, you should have two files: PTGS2rna.doc and PTGS2prot.doc

Swiss-Prot Entry 6. Go to the Expasy website ( and search for the Swiss-Prot entry for PTGS2. (Hint: use the gene name to search and be sure to select the HUMAN protein from the search results). Write at least three alternate names for this protein. 7. Where in the cell is this protein located? 8. What types of drugs target this protein? 9. What amino acid is acetylated by aspirin (amino acid type and number)? 10. What His residue is in the active site?

Sequence Manipulation 11. Go to the Sequence Manipulation Suite ( Click on “Translate” under “DNA Analysis” heading from the menu at left. Clear the data entry box by hitting “Clear”. Copy the mRNA sequence from your Word file and Paste it into the data entry box on the Sequence Manipulation website. Select “Reading Frame 3” and “direct” from the pull-down menus, then click “Submit”. When the Output window opens with your results, copy and past the sequence into a Word document and save it as “translate.doc” on your desktop. For fun, see what happens if you click a different reading frame. Do you get the same protein? 12. Compare this sequence in the “translate.doc” file with the sequence in the “PTGS2prot.doc”. What are the first residues that are the same in the sequences? Do the sequences look like they are the same? (Hint: protein sequences should start with a methionine. Another hint: think about what the *’s mean).

Part 3

Finding homologs of Cox2 with BLAST 1. Open the BLAST web site at NCBI –, then click on the blastp option. Enter your protein sequence, PTGS2prot.doc in FASTA format in the box, then click BLAST (scroll down then click button in bottom left corner). Wait for feedback. You’ll see a diagram of the protein with brightly colored ovals, signifying conserved domains in Cox2. Click on each oval and describe what that domain’s function is. 2. Eventually the program will return a page with a bunch of colored lines – these are the top matches to Cox2. Scroll down beneath the figure with the colored lines – what is the the RefSeq number of the top hit? What species is this? 3. Scroll down to the first one that says “predicted”. What is this one’s RefSeq number? What species is this? 4. For this “predicted” one, click on the “score” link which will take you to a pair-wise alignment between your query sequence (the human gene) and the subject. How many amino acid differences do you see between the two (Hint: look at the line of amino acids between subject and query)? Note that if you were going to use this sequence for a ClustalW alignment, you would have to save this sequence to a file!

Part 4

Multiple Sequence Alignment with ClustalW 1. On the course website under “COX2 Tutorial”, there will be a file called “ClustalWseq”. Click on this link to download the file to your desktop. Open the file, which contains six FASTA formatted sequences of PTGS2 from different organisms. The top sequence is the human PTGS2 protein sequence you have been working with. Go to the ClustalW website ( and enter (by using “copy” and “paste”) all of the FASTA formatted sequences into the data entry box. Copy the alignment and paste it into the Word document you’re using to answer the questions in this lab. You may need to make the font size smaller to get everything in register. Also copy the table that gives you the relative score between any two sequences 2. Review the alignment. What symbols are used for positions in the alignment that contain identical, highly homologous, homologous, and non-homologous residues? Are the residue numbers mentioned Part 2 questions 9 and 10 conserved? Would you expect them to be conserved? Why or why not? 3. Is the N-terminus or the C-terminus of the set of proteins more highly conserved? How do you know? 4. On the alignment table, which two sequences are most closely related?

Part 5

Searching for Journal Articles on PubMed 1. Go back to the NCBI main page ( and click on “PubMed” from the top left. In the search box on the next page, type in “PTGS2 AND Homo sapiens”, then hit “Go”. On the top hit, click on the authors names. This will take you to the abstract of the paper. (Note: You can often get the full text of the article here as well, assuming you are using a computer with a Georgetown IP address – see if there is a box on the right with the journal name). Copy the authors’ names, the article title, the journal name, volume number and page number into your Word document. 2. How many articles did your search return?

Bioinformatics Project

Since the human genome was sequenced in 2001, the information has been mined to understand what makes a human unique compared to other organisms. For this project, which counts for 15% of your final grade, you and a partner will investigate a human gene and the protein it codes for using some of the tools you learned in lab earlier in the semester.

First, choose an enzyme you would like to investigate in more detail. Some possibilities: an enzyme we’ve discussed in class; one, which when mutated, causes an interesting disease; or one related to something you’re working on in your research project. You may not choose COX-1 or COX-2 from the Bioinformatics laboratory, however. The name of the enzyme you and your partner choose will be due on March 14, 2008. If there are two or more groups who choose the same enzyme, then we’ll have a discussion to see if we can find something related.

General instructions:

All information gathered must be appropriately cited, either with the web address or with a standard reference to a journal article. You need to show me your data – this can best be accomplished by taking screen shots of your information to include in your report. For PCs there is a button titled “print screen”. This copies an image of the current screen, which you can then treat like any other image – i.e. paste into Word or PowerPoint then format how you wish. Your paper must be typed.

Items that must be included in your report at a minimum:

1. The name of the enzyme and the Enzyme Commission (EC) number. This number can be obtained by browsing the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature web site - or the BRENDA - site

2. The reaction your enzyme catalyzes. Places to look include BRENDA - and KEGG -, especially the Pathway link. I will be looking for structures here, not just names of substrate and product.

3. The protein sequence of your enzyme – if your enzyme has more than one subunit, choose the catalytic subunit to investigate further. The sequence can be obtained from a number of different sources, but perhaps the easiest will be to go to the National Center for Biotechnology Information (NCBI) – Remember you want the sequence for the human protein. Choose the “Protein” database to search, but be sure to put in the proper limits (i.e. “Homo sapiens” AND your enzyme name). Also remember that NCBI databases use Boolean search terms: “and” “or” and “not”.

4. You then need to BLAST your protein sequence against several other organism databases. Does your sequence have a homolog in yeast (Saccharomyces cerevisiae), wild mustard (Arabidopsis thaliana), nemotodes (Caenorhabditis elegans), fruit flies (Drosophila melanogaster), mouse (Mus musculus), or chimpanzees (Pan troglodytes)? Report the E-value of the best hit in each organism then retrieve and save each complete sequence (in FASTA format, no spaces) so you can then do a ClustalW alignment (part 5). You don’t have to confine yourself to these 6 organisms – feel free to explore further. To BLAST your sequence, go to, then choose each organism in turn. Remember you want to choose BLASTp here. Save each sequence and remember to use the Courier New font.

5. The next step will be to align the sequences you gathered in step 4 using ClustalW using this web site: Enter your sequences in the box again in FASTA format (a “>” sign with information to help you figure out which sequence is which, followed by an “enter” followed by the protein sequence; do this for each of the 7+ sequences you gathered). Show both the alignment and the scores for each of the pairwise comparisons. Can you learn anything about the evolutionary history of your protein from this analysis?

6. Search Protein Databank- and determine whether a solved protein structure exists for your enzyme (be sure to identify which organism this structure comes from; it may be that the human protein structure has not yet been solved). If so, capture an image and include in your report. If possible, identify the residues that are highly conserved in the structure based on your ClustalW alignment. Be sure to provide the 4 space alphanumeric code for any structures you show.

7. Examine the Human Genome Browser at Click on “Genome Browser” top left of main page, then set the browser for the Human Genome, March 2006 assembly. From here, determine on which chromosome the gene for your protein resides, the nucleotides on that chromosome which encompass your gene, and the number of exons in your gene. This is not a website we examined in the Bioinformatics Lab, but it is fairly intuitive. See me if you have problems and I’ll help you.

8. Are there any diseases associated with mutations in the gene for your protein? Check on Note that OMIM works in the same way that PubMed does and there are links to PubMed from these pages.

9. What other features of your protein are important? Examine the peer-reviewed literature for your protein at Some other possibilities to extend your findings: A. You may be able to find information about the modification of your protein (i.e. phosphorylation, glycosylation, disulfide bonds, etc.) from Prosite ( B. There are also websites that can predict aspects of secondary structure. One of these is PredictProtein ( C. Is your protein a membrane protein? There are websites that can help you determine where your protein crosses the membrane. Here’s an example: D. There is a lot of information about genetic variation at the UCSC page. You might want to explore there further. E. Finally, here’s a suite of additional tools that may be of interest: F. This is only a smattering of the websites available for examining protein sequence and structure. Check for others on

Grading Your efforts will be graded on the thoroughness of your explorations and the thoroughness with which you document your explorations. In addition, you will be graded on the spelling and grammar that accompanies your report. Finally, all information must be appropriately cited (both web sites and scientific journal articles). If you have questions about proper citations, please ask and we’ll review.