Amar Mukherjee (PI), Robert Franceschini(Co-PI)
School of Computer Electrical Engineering and Computer Science
University of Central Florida
Orlando, Fl..32816
Contact Information
Amar Mukherjee
School of Computer Electrical Engineering and Computer Science
University of Central Florida
Orlando, Fl..32816
Phone: 407-823-2763
Fax : 407-823-5419
Email: amar@cs.ucf.edu
Personal Home
Page
WWW PAGE
List of Supported Students and Staff
Raja Iqbal, Graduate Research Assistant
Nan Zhang, Graduate Research assistant
Project Award Information
data compression, lossless text compression, bandwidth limited network, archival storage
Project Summary
The goal of this research project is to develop new lossless text compression
algorithms and software tools to incorporate compression in MIME/HTML
standards. The approach consists of encoding the text to exploit the
natural redundancy of a language via the use of a dictionary and then compressing
it
using a pre-existing compression algorithm. The encoding scheme depends
on the specific characteristics of the compression algorithm and the Bzip2
algorithm based on Burrows-Wheeler transform is used as the main backend
compression algorithm. A basic understanding of the interaction of the
encoding schemes and the compression algorithms is being developed.
The performance of the algorithms is being measured taking into account
both
compression and communication metrics. Infrastructure tools are being
developed using dynamic caching of dictionaries to embed compression into
MIME/HTML standards. The impact of the research on the future of information
technology is to develop data delivery systems where communication
bandwidth is at a premium and archival storage is an exponentially
costly endeavor. It is expected that the new lossless text compression
algorithms will
have 5 to 10% improved compression ratio over the best known pre-existing
compression algorithms which might translate into a reduction of more than
50% of the text traffic on the Internet. The experimental research
is linked to educational goals via rapid dissemination of results via reports,
conference and
journal papers, doctoral dissertation and masters thesis, and transferring
the research knowledge into the graduate curriculum. Software tools developed
under this grant will be shared via a web site.
Goals, Objectives, and Targeted Activities
First Year
Second Year
We have nearly accomplished the first goal of the first year. The results are posted in this web page (Project Results Page). We have discovered two new modeling schemes for pre-processing the input text and obtained uniformly better results in all the text corpus that we tried. The improvement beats the PPM* and the BWT algorithms in their native forms and show 3% to 30% better compression ratios over all of the compression techniques (viz., Huffman, arithmetic, LZW, DMC, PPM, PPM*, BWT).
We have started work on the second objective which is to understand
mathematically and theoretically why the modeling schema developed under
our research project is performing better than all other algorithms.
Project Impact
We expect that our research will impact the future status of information
technology by developing data delivery systems where communication bandwidths
is at a premium and archival storage is an exponentially costly endeavor.
We expect to develop new lossless text compression algorithms that will
have 5
to 10% improved compression ratio over the best known pre-existing
compression algorithms which might translate into a reduction of more than
50% text traffic on the Internet. We expect to develop software tools
to include compression in MIME/HTML standards, to automate dynamic
caching, and to
interact with web browsers, e-mail programs and authoring tools.
We have started (in the Spring 2000 semester) a new course entitled "Multimedia Compression on the Internet". The course web page is as follows: Course Web Page. This is a graduate level course with 14 students enrolled in the Spring 2000 semester. This particular topic has grown directly out of the research that we have been conducting for the last couple of years on data compression. Lecture topics have included both text and image compression, including topics from the research on the current NSF grant.
Project References
This work was started several years ago; two early papers that established this work are the following:
Area Background
In the last decade of the 20th century we have experienced an unprecedented
increase in the amount of digital data transmitted across public networks,
in particular across the Internet. Recent standards such as the World
Wide Web, HTTP, HTML and image standards have provided free access to a
vast
amount of information available from public servers.
With the continuously increasing use of the Internet the efficient use
of available resources, in particular hardware resources, has become a
primary
concern. One way of improving performance is by compressing and caching
data, i.e. by ensuring that data is not sent more often than necessary,
that as little data is sent in response to each request as possible, and
that all data transmitted across networks is compressed during transmission,
to
increase the efficiency by which the existing infrastructure is utilized.
This proposal is concerned with integrating new data compression
and caching techniques for improving the efficiency of bandwidth-limited
Internet.
We propose new algorithms to preprocess text in order to improve the
compression ratio of textual documents, in particular online documents
such as web pages on the World Wide Web. We also develop similar
techniques for images that do not change with time. We describe how our
algorithm can be efficiently used to aid in the compression of documents
consisting of large texts and static images, for transmission across public
data networks such as the Internet, using existing standards such as HTTP
and HTML. The compression algorithms also can be used to utilize archival
storage effectively.
Our approach integrates the two basic ideas of compression and caching
into a single mechanism, i.e. data is automatically cached whenever possible,
avoiding retransmission altogether, and if data has to be transmitted,
it is compressed in a very efficient way, which surpasses the performance
of
compression algorithms which are widely used at this time.
Area References
Potential Related Projects
Another area of active interest within our research group at UCF is image and video compression using wavelets.