
Gene identification and gene discovery in new genomic sequences is one of the most timely computational questions addressed by bioinformatics scientists. This computational research has resulted in several systems that have been used successfully in many whole-genome analysis projects. As the number of such systems grows, the need for a rigorous way to combine the predictions becomes more essential. In this talk I will describe a Bayesian network framework for annotation of genomic sequences that relies on combination of gene predictions from multiple systems. The framework allows one to treat the problem as combining the advice of multiple experts. Previous work in the area used relatively simple ideas such as majority voting. I introduce, for the first time, the use of Hidden Input/Output Markov models for combining gene predictions. The framework is applied to the analysis of the Adh region in Drosophila that has been carefully studied in the context of gene finding and used as a basis for the GASP competition. The main challenge in combination of gene prediction programs is the fact that the systems are relying on similar features such as codon usage and as a result the predictions are often correlated. I show that this approach promises to improve the prediction accuracy and provides a systematic and flexible framework for incorporating multiple sources of evidence, such as stochastic sequence models as well as sequence homologies, into a gene prediction system.
Vladimir Pavlovic is a research professor in the Bioinformatics Program at Boston University. Before joining Boston University, Vladimir was a member of research staff at Compaq's Cambridge Research Laboratory (CRL) in Cambridge, MA. At CRL, Vladimir led the effort in applied machine learning. His interests include applications of machine learning and probabilistic inference to problems in bioinformatics, computer vision, HCI, and modeling of complex systems. Vladimir received a Ph.D. in electrical and computer engineering from the University of Illinois at Urbana-Champaign in 1999.