CAP 5937 - Bioinformatics

Fall 2004
MW 4:30pm to 5:45pm CSB 221 Main Campus(Orlando)
Key Code (Class Number):88974

Prof. Amar Mukherjee
Office 208 CSB
Office and/or email hours: MW 2-3:00pm, Tu 3-5:00pm
Email Address:
Telephone: 407-823-2763
Teaching Associate: Mr. Ron Zhang CSB113 (Tel. 407-823-3934)
Office hours: MW 3-4:00pm
Research Asociate: Mr. Ravi Vijaya Satya. Hours: By appointment
Tel: 407-823-3934 Email:

Academic Calendar

Methods of Evaluation (Subject to Change)
Homework Assignments 25%; Midterm(s): 20%; Term project: 20%; Final:35%
Grading: >95% is A; 90-95% A-; >85% B+, 82-85% B,80-81% B-
C and D grades are broken down similarly like B in the 70's and 60's percentages. Any score below 60% is F.

Final Exam Schedule: 12/6, 4:00pm to 6:50pm


In 1952 transistor was invented. In 1953, the double helix structure of DNA was discovered. These two events have revolutionized the respective fields of computer science and biology. Biology is now on the verge of another revolution in which computation is playing a central role. Bioinformatics and Computational Molecular Biology are concerned with the use of computing and mathematical sciences as tools to advance traditional laboratory-based biology. With the advent of new automated sequencing and assembly tools, the capabilities to acquire genomic data have become very powerful. This has given rise to an exponential growth of genomic information. The need to process this data for further scientific advances and to understand its role in heredity, chemical processes within the cell, drug discovery, evolutionary studies etc have created new problems that are of interdisciplinary nature. This course will provide an introductory but exciting tour of basic ideas of molecular biology for Computer Scientists and an in-depth look at some of the core algorithms that are used in molecular biology. The students will also learn about computational biology tools available over the Internet.

Pre-requisite: The course is designed for graduate students who might be interested to pursue a possible future research or industry career in bioinformatics. It is also open to senior level undergraduates with good academic standing. There is no formal pre-requisite for the course but having a background in the design and analysis of algorithms will be desirable. A background in biology will be a plus. This course is also open to biology students interested to learn about algorithms and to physics, chemistry, and mathematics and statistics graduate students interested in interdisciplinary research.

Course Outline

Genetics. Molecular biology. Cell. Chromosomes. The structure of DNA. The Central dogma of molecular biology. Gene expression. Transcription. Types of RNA: mRNA, tRNA and rRNA. Translation. Genetic code. Regulatory genes. Mutations. Genomes of prokaryotes, eukaryotes and human.

Introduction to string pattern matching algorithms. Suffix Trees. Matching against a DNA or Protein library of known patterns: STS, EST and CpG islands. Sequence Analysis. Measures of sequence similarity. Dynamic programming. Pair wise alignment. Longest Common Subsequence problem. Local Alignment. Alignment with Gaps. Multiple Sequence Alignment.

Sequence Data Base Search. GenBank. Protein Sequence Database. PubMed. BLAST,PSI-BLAST, FASTA, PAM matrices. BLOSUM

Mapping and Sequencing. Basic Molecular techniques. Cutting and Pasting of DNA Sequences. Restriction Enzymes. DNA Cloning. Polymerase Chian Reaction. Hubridization. Microarrays. Gel Electrophoresis. Mapping. Physical versus Genetic Mapping. STS Mapping. Fingerprinting. Sequence assembly. Shotgun Assembly. Shortest superstring problem. Sequencing by hybridization.

Strings and Evolutionary Trees: Parsinony-based tree building algorithms. Perfect phylogeny problem and algorithms. Distance based methods. Ultrametric trees. Additive distance trees. Connections between multiple alignment and tree construction.

Additional Suggested Topics via students' term projects: Probabilistic and statistical models for sequence alignment and phylogenetic trees. Construction of Haplotypes from Genotype data. Genome Rearrangements. Molecular Structure Prediction RNA secondary structure prediction; the protein folding problem.

Recommended Reference Text:

1. D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Sci- ence and Computational Biology, Cambridge University Press (1997)

2. J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology, PWS Publishing Company (1997)

3. T. A. Brown Genetics A molecular Approach

4. B. Alberts, A.Johnson, J. Lewis, M.Raff, K.Roberts and P. Walter Molecular Biology of Cell, Garland Science, 4th Edition.

5. P. A. Pevzner, Computational Molecular Biology: An Algorithmic Approach, The MIT Press, 2000.

Assignments: Homework assignments,reading assignments, a term project, midterm and final examinations.

Webpage: The webpage for this course is webpage will be used to post most of the course material including lectures, homework assignments, term projects etc. If you click 'Lectures ' below you will get slides and lecture notes. These material are not intended to be complete set of notes which are currently being modified and revised. The material included in these slides and lectures are based on material contained in the references. These notes and slides should be used as supplementary material along with live lecture presentation and should only be used as a guide to study the material from the recommended references. Copying for distribution and/or sale is strictly prohibited by copy right laws. The webpage for the M5 Reserach Group is
M5 Research Group

  • Lectures
  • Temorary Files
  • Homework
  • Reading
  • Programming
  • Term Project
  • Review
  • Exams

  • Bioinformatics Reserach Publications: We also give a link here to recent research publications on Bioinformatics generated at our reserach laboratory at the School of Computer Science at UCF.

  • Research Papers

  • Pointers to Other Interesting Websites

  • Last modified Summer , 2004.