CAP5510: Introduction of Bioinformatics

 

Fall  Semester 2007,

Tuesday and Thursday 6:00- 7:15pm

HEC 0117

 Instructor: Shaojie Zhang

Office Hour: T/Th 11:00 am -12:00 pm and Th 5:00 pm - 6:00 pm

Description: This course will summarize algorithms in bioinformatics.

1)     Molecular and genome biology for computer scientist.

2)   Comparing biological sequences – Sequence alignment, Pair-wire and multiple alignment, global and local alignments, whole genome alignments.

3)    Finding genes from genomes – Gene prediction.

4)   Finding regulatory motifs – Finding signals in the genomes.

5)    Analysis genetic mobile elements and repetitive DNAs in the genomes – repeat analysis

6)   Non-coding RNA in the genome – RNA folding and find new ncRNAs.

7)    Genome rearrangements and genome evolution - How genomes are evolved? Why can genome rearrangements cause cancer? Whole genome duplication theory.

8)   Molecular evolution.

9)   Sequencing DNA - whole genome assembly and shotgun sequencing.

10)   Identify proteins - computational proteomics.

This course is designed for the advanced level bioinformatics or EECS graduate students. Graduate students with entry level background in bioinformatics research are welcome to take this course. Biological background students who are interested in comparative genomics are also welcome.

Textbook (I will distribute complementary lecture notes and papers along the course for these topics):

1)     N. Jones and P. Pevzner, “An introduction to bioinformatics algorithms”. (Major text book.) Most of the lecture notes (slids) are from the companion website for this book: Link.

2)   P. Pevzner, “Computational Molecular Biology: An Algorithmic Approach”. Some of the topics are from this book.

3)    R. Durbin, S. Eddy,  A. Krogh, and G. Mitchison, “Biological sequence analysis – Probabilistic models of proteins and nucleic acids”

Assignments: In addition to TWO take-home assignments and ONE class research project.

Grading: Assignments (30%), Mid-term (15%), Final (15%), Final project (40%)

Lecture Notes and Handouts:

  • Lecture 0 (08/21/2007): Course Introduction and Lecture by Russell Doolittle (UCSD) - Life before Bioinformatics
  • Lecture 1 (08/23/2007): Molecular Biology Primer
  • Perl Reference Link
  • Perl Reference Link2 (pickingupperl)
  • Lecture 2 (08/28/2007): Comparing Biological Sequences (I) (Edit Distance)
  • Reading 1: Myers Miller algorithm.
  • Reading 2: Blast
  • Homework 1.1 (Updated!)
  • Homework 1.2 : Implement Gotoh's algorithm (fig 1(a)), or the linear space version (fig 3 ).
  • Homework 1.3 : Problem 6.51 (See textbook).New!
  • Lecture 3 (08/30/2007): Comparing Biological Sequences (II) (Global sequence alignment, local sequence alignment, affine gap alignment, scoring matrix, banded alignment, and BLAST)
  • Homework 1.1 and 1.2 is due on Sep 25th 2007.
  • Lecture 4 (09/04/2007): Comparing Biological Sequences (III) (BLAST, FASTA, multiple alignment)
  • Escherichia coli K12 protein sequences : NC_000913.faa
  • Reading 3: DIALIGN
  • Lecture 5 (09/06/2007): Finding Genes from Genomes (I)
  • Reading 4: Phylo-HMM (Jornal Version and Book Chapter Version )
  • Lecture 6 (09/11/2007): Finding Genes from Genomes (II)
  • Lecture 7 (09/13/2007): Motif Finding Problem
  • Test seqeunces for HW 1.2: dog_ifn.fasta and pig_ifn.fasta
  • Reading 5: MEME
  • Lecture 8 (09/14/2007 rescheduled lecture) : Genome Sequencing and Assembly
  • No class on 09/18 and 09/20.
  • Lecture 9 (09/25/2007): Non-coding RNA genes
  • Lecture 10 (09/27/2007): RNA folding problem
  • Mid-term exam is 10/04 Thursday 6:00 to 7:00 pm
  • Lecture 11 (10/02/2007): RNA alignment problem
  • Reading 6: FastR
  • 10/04/2007 Mid-term
  • Lecture 12 (10/09/2007): RNA consensus folding, RNAz, and RNAscf.
  • Reading 7: RNAscf
  • Lecture 13 (10/11/2007): Pseudo-knotted RNA alignments
  • Lecture 14 (10/16/2007): MicroRNA Targets Prediction and Analysis
  • Lecture 15 (10/18/2007): Genome Rearrangements
  • Lecture 16 (10/23/2007): Rearrangements in Tumor Genomes
  • Projects Presentation (10/25/2007): Plans and Goals (5-7 mins for each group)
  • Lecture 17 (10/30/2007):Protein Sequencing and Identification by Mass Spectrometry (I)
  • Homework 2 (Due 11/27/2007 6:00 PM Eastern Time)
  • Lecture 18 (11/01/2007): Mass Spec (II)
  • Lecture 19 (11/06/2007): Mass Spec (III)
  • Lecture 20 (11/08/2007): Molecular Evolution (I)
  • Lecture 21 (11/13/2007): Molecular Evolution (II)
  • Final Exam: 12/04/2007 4:00 pm - 6:30 pm (HCEC 117)
  • Extral Office Hours: 11/29/2007 11:00 am - 12:00 pm, 4:00 pm - 6:00 pm, 7:15 pm - 8:30 pm.
  • Extral Office Hours: 12/04/2007 10:00 am - 12:00 pm, 1:00 pm - 2:00 pm
  • Project Reports is due 11/29/2007 6:00 pm.
  • Project Presentation Schedule: 11/29/2007 6:30 pm - 7:30 pm and 12/06/2007 6:00 pm - 9:00 pm.
  • Lecture 22 (11/15/2007): Microarray Clustering
  • Extra Lecture: 11/16/2007 6:00 pm - 7:15 pm (Friday).
  • Lecture 23 (11/16/2007): Population Genetics (I)
  • No class on 11/20/2007.
  • Lecture 24 (11/27/2007): Population Genetics (II)
  • Class Projects Presentations (12/06/2007) Location: HEC 356, 6:00 pm - 9:00 pm

    Class Projects:

  • Phylo-HMM.
  • RNA consensus folding.
  • RNA folding with pseudoknots
  • RNA multiple alignment
  • Human population phylogeny
  • Coding regions in mamalian genome alignments
  • Motif fiding
  • MicroRNA genes and their targets