## Fall 2012, CAP 6515: Algorithms in Computational Biology

Instructor: Shaojie Zhang

Lectures: T/Th 4:30-5:45 pm ENGR 286

Office hours: T/Th 3:00 - 4:00 pm, HEC 311

Syllabus: The course will concentrate on computer science aspects of computational molecular biology and will emphasize the recent advances and open problems in the area.

The computational techniques will include algorithms, graph theory, combinatorics, machine learning, etc. Many important bioinformatics topics will be used as examples to illustrate how the formulation and solution of a computer science problem can help to answer a biological question.

This course is designed for computer science graduate students. No biology knowledge is required.

Graduate students with either biology or physical/computer science backgrounds who have taken a fundamental bioinformatics course are also welcome to take this course.

Preliminary topics to be covered:
1. Introduction to algorithms
2. Exact string matching, Data structure: Suffix tree
3. Suffix tree: applications
4. Mismatch tree and the motif finding problem
5. Breaking problems down: dynamic programming
6. Combinatorial search: intractable problems
7. Integer programming; Reductions
8. Graph algorithms: trees
9. Haplotyping via perfect phylogeny
10. Comparing trees
11. De Bruijn graph; Eulerian graphs
12. Minimum Spanning trees; Shortest paths
13. Matching problems; Network flow
14. Breakpoint graph
15. Set cover and set packing
16. Graph-based clustering
17. Heuristic algorithms

Reference Textbooks:

• P.A. Pevzner, Computational Molecular Biology: An Algorithmic Approach. The MIT Press, 2000 (CMB)
• D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. 1997 (Dan)
• Assignments: We will have 4 take-home assignments.

Grading: Assignments (40%), Midterm exam (25%), Final exam (25%), Attendance (10%).

Topics and Tentative Agenda:

Week Date Topic Slides Notes
1 08/21 Introduction to algorithm PDF
08/23 Exact string matching PDF Dan 2.3, Dan 1.4 KMP
2 08/28, 08/30 Z-value PDF
Assignment #1 (Due 09/11)
Dan 1.4
3 09/04,09/06 Suffix trees PDF Dan 6
Suffix
4 09/11 Dynamic Programming PDF CMB CMB 6.2 and 6.3
Myers-Miller algorithm
09/13 DP Applications: RNA folding, complicated parameters PDF mfold
Zuker-Sankoff(1984)
5 09/18 DP Applications: RNA alignments with guiding tree PDF
Assignment 2 (Due 10/04)
Paper_1
Paper_2
09/20 DP Applications: Pseudoknotted RNA alignments PDF pseudoknot
6 09/25 DP Applications: RNA folding, a different formulation PDF RNAscf
09/27 DP Applications: Spliced sequence alignment PDF CMB 9.4
7 10/02 NP-hard problem and multiple sequence alignment PDF
10/04
No class

Research:
We are always looking for motivated students. If you are looking for research projects, please get in touch.