Algorithms on Strings and Sequences

COT6417 Algorithms on Strings and Sequences

Spring 2002
Key Code for Registration: 6912
M W 16:00 - 17:15pm (BA 126)
Prof. Amar Mukherjee

Posible Class Schedule Change:

If you are interested to take this course and you have a conflict with the time or day of the week,you should go ahead and register for the course with a override from Dr. Ron Dutton. At the first class meeting, we will re-schedule the course to a time slot acceptable to all students. Note, without the override, the system will not allow you to register for the course if you have a conflict. If you have any questions, please contact me at amar@cs.ucf.edu

Background:

This course is offered approximately once every two years and is specially designed to give graduate students an opportunity to learn about a specific reserach topic of current interest. For historical reasons , the course has a title of "VLSI Algorithms and Architecture" because in the past offerings the course taught the design of high-level application-specific architectures that are suitable for VLSI implementation. For the Spring 2002 offering, the major emphasis will be on algorithms on strings and sequences with applications to compressed domain pattern matching problems for text and images, and bioinformatics. A few systolic hardware algorithms will also be presented which will need practically no background on VLSI design.

Pre-requisite:

Graduate students with background on design and analysis of algorithms. Open to all graduate students.

Course Outline

String matching algorithms: Karp-Rabin, Knuth-Morris-Pratt and Boyer-Moore algorithms, Aho-Corasick algorithm. Pattern matching with 'don't care' characters and with regular expression. Systolic algorithms for pattern matching.

Two-Dimensional pattern matching: Zhu-Takaoka algorithm, Bird/Baker algorithm.

Longest Common Subsequence problem: dynamic programming formulation. Edit distance between two srings. Hirschberg's algorithm. Systolic hardware algorithm.

Suffix trees: Ukkonen's algorithm. McCright algorithm. Application of suffix trees to pattern matching, k-mismatch problem, common substring problem and bioinformatics problems. Systolic algorithm for suffix tree construction. Implementation on FPGA.

Approximate String matching problem: Shift-Or algorithm, String matching with k-mismatches, string matching with k-differences.

Compressed domain pattern matching: a brief overview of compression algorithms. relationship between pattern matching and compression. Sorted context compression algorithms. BWT transform. compressed domain pattern matching algorithms for text and images. Hardware algorithms for compressed domain pattern matching. Open research problems.

Sequence alignments and dynamic programming: string similarity, local alignments, multiple string alignment problems. Maps and sequencing. Strings and evolutionary trees.

Recommended Reference Text:

There is no text book for this course. The following book is a recommended reference text: Dan Gusfield, "Algorithms on Strings, Trees, and Sequences", Cambridge University Press, 1999. Several websites with downloadable tutorial material will also be used as references. Papers from journals and conferences will be assigned to the class for study and discussions on a regular basis.

Assignments: The students will be assigned papers and parts of books/ websites for study, presentation and discussion in the class on a regular basis. Each student will have to write a paper on a selected reserach topic with inplementation and results. The students will be graded on the basis of their performance in class participation, presentations and discussions, and on the basis of the original contributions made in the term paper. There may be occasional homework assignments. A final examination will take place.

Webpage: The webpage for this course is http://www.cs.ucf.edu/courses/cda621/.This webpage will be used to post most of the course material.

Office Hours for Spring 2002: TBA


  • Lectures
  • Reference Links
  • Assigned Papers etc.
  • Term Paper


  • Pointers to Related WWW Pages
  • Last modified December 16, 2001.