String data is everywhere—from web pages to DNA sequences containing our genetic make-up, to the text you’re reading right now.

Being able to store, search, and manipulate massive data sets quickly and efficiently is a continuing challenge that becomes increasingly difficult as people generate and collect more and more data. Many data sets are strings or sequences of characters. In recent months, Dr. Sharma Thankachan, an Assistant Professor in the Department of Computer Science at the University of Central Florida (UCF), has been awarded over $1 million in research funding by the National Science Foundation (NSF) to investigate efficient algorithms for the management of string data.

In February 2022, Dr. Thankachan received an NSF CAREER Award—the most prestigious award given to early-career faculty by NSF. The nearly $600K grant will enable him to investigate how to best model, index, and query pan-genomic data, which consists of comprehensive collections of DNA sequences. Such genome collections are now ubiquitous, and they are ever-growing in size—thanks to the advancements in low-cost sequencing technologies. This project focuses on advanced techniques for indexing graph-based representations of genomic data, which will make an enduring impression on data-driven bioinformatics.

In October 2021, Dr. Thankachan was also awarded approximately $450K by NSF to study the theoretical aspects of repetition-aware string compression and indexing. Repetitiveness is a feature that is prevalent in modern data sets, and it makes such data sets highly compressible. The new challenge here is to design indexing techniques whose space efficiencies match the repetition-aware compression techniques. To that end, this project will launch foundational research on compressed data structures and contribute broadly to theoretical computer science.

Dr. Thankachan received his undergraduate degree from the National Institute of Technology in Calicut, India, in 2006 and his doctorate in Computer Science from Louisiana State University in 2014. After two years working as a Research Scientist and Post-Doctoral Fellow in the School of Computational Science and Engineering at the Georgia Institute of Technology, he joined UCF’s Department of Computer Science in 2017.

Dr. Thankachan