HomeWork 4

Due: Wednesday, 11/03/04

 

Read the paper following paper on enhanced suffix arrays:

 

http://www.zbh.uni-hamburg.de/staff/kurtz/papers/AboKurOhl2002.pdf

 

 

(1) Build the enhanced suffix array for the following string:

 

S = agttacgtacgatga$

 

The enhanced suffix array must have at least the index, suftab, lcptab and the suffix fields. Assume the lexicographical order a < c < g < t < $.

 

(2) Give two reasons why suffix arrays are superior to suffix trees for whole genome analysis.

 

(3) What is a tandem repeat? What is a branching Tandem Repeat?

 

(4) Give an intuitive explanation(not necessarily the optimal approach) of how you would check for a branching tandem repeat of length at least 2l using this enhanced suffix array. Is there a branching tandem repeat of length 2l with l³ 4 in S? If yes, what is it?