CAP 6938: Data Mining in Bioinformatics

 

 

Lecture:TTh 4:30PM - 5:45PM

Location: HEC 0111

Instructor: Dr. Haiyan Nancy Hu

Email: haihu@cs.ucf.edu

Office: HEC- 412

Phone Number: 407-882-0134

Office Hours: TBD

 

 

Description:

This course will summarize computational techniques for bridging two fields: data mining and bioinformatics, for successful mining of biological data. Recent progress in biology, medical science, bioinformatics, and biotechnology has led to the accumulation of tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data mining research has led to the development of numerous efficient and scalable methods for mining interesting patterns in large databases. In this class, we will provide an overview of the data mining methods that help biodata analysis. Also, we will outline some research problems that may motivate the further development of data mining tools for the analysis of various kinds of biological data.

Bioinformatics is an active and interdisciplinary research area. This course is open to all students with background such as computer science, biology, mathematics or statistics who are interested in bioinformatics research.

 

 

Prerequisite:

No formal prerequisite and open to all graduate students.

 

 

Book References:

¡°Data Mining : Concepts and Techniques¡± by Jiawei Han and Micheline Kamber, Elsevier, 2000.ISBN 1558604898.550 pages.

¡°Molecular Biology of the Cell¡±, by Bruce Alberts et al., 4th edition, 2002.

 

Grading:

Assignments (30%). Assignments include reading papers, writing paper summary/critiques etc.

Paper presentation (15%). Each student will give a presentation of a course-related paper. Students are required to discuss with the instructor to decide the topic he/she would like to present.

Final project (50%). We may have students from very diverse background such as biology and computer science. Final problem-solving projects can be either biology-oriented or programming-oriented depending on a student's own background. Students are required to discuss with the instructor to design the final projects during the early weeks of the class. A student is encouraged to discuss with the instructor on collaborating with another student with different background on the final project.

Class participation(5%).

 

 

Preliminary Schedule:

Date Topic Notes and Readings*
W1: 08/25 Introduction/Administrivia Notes, Readings
08/27 Part I: Microarray gene expression data Notes
W2: 09/01 Identification of differential expressed genes Notes
09/03 Clustering of gene expression data Notes, Readings ; Readings
W3: 09/08 Gene ontology and function annotation Notes
09/10 Gene expression pattern using PCA Notes, Readings;
W4: 09/15 Biclustering of gene expression data Notes, Readings;
09/17 Cancer classification using gene expression data Notes, Readings; Readings;
W5: 09/22 Part II: Biological pathway Notes
09/24 Protein and protein interaction network Notes, Readings (project proposal due)
W6: 09/29 Pathway databases Notes
10/01 Reconstruct biological pathway Notes, Readings; Readings
W7: 10/06 Infer disease relevant pathways Notes, Readings; Readings
10/08 Pathway simulation Notes, Readings;
W8: 10/13 Mine disease-causal genes using Semantic web Notes, Readings (short presentation by Sandy)
10/15 biological literature mining I Notes, Readings (short presentation by Yuan)
W9: 10/20 biological literature mining II Notes, Readings (short presentation by Peter)
10/22 Part III: SNP Notes; Readings; (short presentation by Cuncong)
W10: 10/27 SNP databases Notes; Readings; (short presentation by Hector)
10/29 Linkage analysis Notes; (short presentation by Matt)
W11: 11/03 Mapping disease loci by linkage analysis Notes; (short presentation by Eric)
11/05 Mapping disease gene by whole genome association study

Notes; Readings (short presentation by Chuan)

W12: 11/10 Mapping determinants of human gene expression (EQTL) Notes, Readings; Readings; Readings ; Readings; (short presentation by Victor)
11/12 EQTL II Notes; (Two paper critiques due)
W13: 11/17 Frequent Pattern Mining Notes;
11/19 Integrative Genome-Phenome mapping Notes; Readings
W14: 11/24 Final project presentation (Hector, Yuan, Chuan)
11/26 Thanksgiving holiday, no class
W15: 12/01 Final project presentation (Cuncong, Pete, Sandy)
12/03 Final project presentation & final project due (Matt, Victor, Eric)

* Lecture notes will be put on the website before each class. Reading material is subject to change.