Warning: this is a LONG set of notes. It may take us more than one lecture to get through them. Get as far as you can, work hard and don't panic. Remember, all my tests are open book and notes.... Nobody should have to remember this stuff! But you must understand it. Today we're covering pages 113 to 142 of the SN text.
We begin. We will study several basic techniques including run length encoding, Huffman coding and the Walsh and Discrete Cosine transforms, and differential coding.. Then we'll put them together into JPEG.
You need to know how to compute the amount of storage required for raw, uncompressed data, following the examples on page 115.
Query 5.1: Compute the storage required for a 540 x 480 image with 8 bits per pixel.
5.2: Name two forms of entropy encoding. Name two forms of source encoding.
5.3: What kind of data compresses most efficiently for run length encoding? Least efficiently?
5.4: The use of the ! character on pages 121 and 122 is called an escape sequence. Using ! as the escape character, one-byte character counts and a minimum run-length of 4, run-length encode the following data. The blanks in the data are NOT to be encoded; they are simply inserted to make it easier for you to read and count characters.
0000 0001 1122 2222 2200 5555 5555 000
Why is the minimum run length chosen as 4?
Huffman coding is very important. It requires that the probability of occurrance of each symbol in an alphabet be known. For practice, we have the next query.
5.5 Develop a Huffman code for an alphabet with for letters with the following probabilities:
A: 0.5
B: 0.25
C: 0.125
D: 0.125
Before I can teach you about the DCT, I have to teach you (or remind you) about vectors, and matrix multiplication. Vectors are useful for many things. To make this lesson interesting, we'll consider vectors as used in computer graphics for geometric representation. We will represent points in two dimensions with a vector like (x, y, 1). The purpose of the 1 will perhaps be clear later.
Matrix Multiplication. To multiply the matrix M by the vector
V, where v = (x y 1) and
a b c
M = d e f
g h iWe like to arrange V as a column on the right side of M. Then we slide a left-hand finger across the top row of the matrix, while we slide a right-hand finger down the column of V. We multiply each pair of symbols and add the results. Thus
a b c x ax + by + 1*c <-- the first component of W
W = M*V = d e f y =
g h i 1
Then we do the same thing to compute the second and third components of W:
a b c x ax + by + 1*c
W = M*V = d e f y = dx + ey + 1*f <-- the second component of W
g h i 1 gx + hy + 1*i <-- the third component of WSo we wind up with a new vector, the result of multiplying M by V. We can think of the matrix M as a machine which takes in a vector and spits out another vector.
Now, for practice, please multiply V = (2,3,1) by the matrix S below.
2 0 0
S = 0 2 0
0 0 1Query 5.5: What happened to the point (2,3) in XY space, represented by the vector (2,3,1) ? S is called a SCALING Transformation.
Now try this one:
1 0 5
T = 0 1 0
0 0 1T is called a TRANSLATION. What did it do to the point (2,3)?
cos A sin A 0
R = -sin A cos A 0
0 0 1Query 5.6. Cos 90 degrees=0; sin 90 degrees = 1. Try this matrix on the point (2,3), with A=90 degrees. Where does your point go?
Now that you can multiply matrices, we will forget about (x,y) and geometry, and go onward.
DCT (Discrete Cosine Transform) is very important. So I'm going to teach you about an easier-to-understand "cousin" of DCT, called the Walsh transform. Then we'll talk about DCT. We begin with the one dimensional Walsh. Consider a stream of data (real numbers); perhaps it's a series of samples of an audio signal.
You're familiar with the idea of building up binary numbers by adding together base components; or with making change by adding together pennies, nickels, dimes and quarters. (Why do coins keep coming up in this course?) For instance, to spell out the value "9" in binary, we need an eight (1 0 0 0) and a one (0 0 0 1). Adding, we get 1 0 0 1.
The Walsh transform represents a series of numbers by using a base set
like this one (for series of 8 numbers)
B0 = 1 1 1 1 1 1 1 1
B1 = 1 1 1 1 1 1 1 0
B2 = 1 1 1 1 1 1 0 0
B3 = 1 1 1 1 1 0 0 0
B4 = 1 1 1 1 0 0 0 0
B5 = 1 1 1 0 0 0 1 1
B6 = 1 1 0 0 1 1 0 0
B7 = 1 0 1 0 1 0 1 0We refer to these bases as B0, B1 ... B7. You can see that all of these are "really" repeating series, it's just that B0 through B4 haven't got room to be seen actually repeating. (B0 either repeats with every character, or never repeats, depending on your point of view.Its frequency is zero, anyhow, so its wavelength must be infinite.)
Now, what if we wanted to represent a series like S1 = 3 3 3 1 1 1 3 3? A little experimentation would reveal that S1 = B0 + 2*B5. In fact, you can represent ANY series of 8 numbers, by some linear combination of these bases. The coefficients might be negative. For instance, if I wanted to produce the series -1 1 -1 1 -1 1 -1 1, how would I do it? Answer is revealed below, but try it first!
B0 is called the "DC Component". DC means "direct current", which is what you get out of a battery. It provides a constant voltage (until the battery runs down.) The rest are AC (alternating current) components, with various frequencies. The frequency of a signal is the reciprocal of its wavelengh (how long it takes to repeat.) What's the frequency of B7? B6? B5? See a pattern here?
Definitions: A signal S is a series of numbers S0, S1, S2... Sn. A transform T is a series of coefficients T0, T1..Tn. T is the Walsh Transform of S, if S = T0*B0 + T1*B1 + T2*B2 ... + Tn*Bn.
It would be extremely tedious to have to hunt around for the Walsh transform. Turns out, though, that there's an elegant way to do it. MATRICES and VECTORS!
To cut down on the typing, we'll use signals with four samples instead
of 8. Here's the basis set, as a matrix called M4. Note that the bases
B0, B1 etc. form columns in M4.
1 1 1 1
M4 = 1 1 1 0
1 1 0 1
1 0 0 0Now, if I multiply a Walsh transform vector like T = (1 0 0 -2) by placing its column form on the right of M4, I should get the corresponding signal S = (-1 1 -1 1). I hope you got that result, too.
This trick works if you know T and want S. But what if you have S and
want to find T? Well, if you remember your linear algebra (which very few
computer scientists seem to remember, if they ever had it back in high
school), you know that if S = M*T, then T = R*S where R is the "inverse"
of M. Not all matrices have inverses, but fortunately the Walsh matrices
do. Here's R4:
0 0 0 1
R4 = -1 1 1 -1
1 0 -1 0
1 -1 0 0Query 5.7: Computer R4*S, where S=-1 1 -1 1, and see if you get our alleged transform which was 1 0 0 -2.
SUMMARY: If I gave you a series of numbers, you could compute its walsh transform, and vice versa. The most important thing to note is that the transform is a frequency analysis of the signal. It can be read as saying that the signal has 1 volt of B0 (=DC) and -2 volts of B3 (which was the component with frequency 1/2.)
What do we mean "frequency 1/2"? Well, if we knew what units the time axis was in, we could calibrate that. Assume we're taking 1000 samples per second. So we see that B3's frequency is 1/2 cycle per millisecond, or 500 cycles/second (Hz).
Now what about the DCT? It's the same deal, only with a different matrix.
Spatial frequencies.
BUT NOW we stop thinking about cycles per second, and start thinking about cycles per inch. What if the sample S consisted of pixel values along a scan line? We could still talk about its frequency. A pattern like 1 0 1 0 1 0 which repeats itself 50 times in 100 pixels, could be said to have a frequency of 1/2 cycle per pixel.
A pattern can have different frequencies in the X and Y directions.
What are spatial frequencies of this pattern?
1 1 0 0 1 1 0 0
1 1 0 0 1 1 0 0
1 1 0 0 1 1 0 0
0 0 1 1 0 0 1 1
0 0 1 1 0 0 1 1
0 0 1 1 0 0 1 1
1 1 0 0 1 1 0 0
1 1 0 0 1 1 0 0In the horizontal direction, it repeats every 4 pixels. In the vertical direction, it repeats every 6 pixels. SO... we can consider the idea of the Walsh transform of an image, too! The Walsh transform of a one dimensional signal with N samples consisted of N coefficients. The transform of an n x n image will, by analogy, consist of an array of nxn coefficients. Let's call the pixels of the image Syx. (Why not xy? Because the text uses yx, on page 138, and we gotta get re-connected to the text now.)
The text refers to the transform as being Svu, but I wish they had called it Tvu. The two dimensional sum on page 138 contains two terms with cos in them. Together, these two terms define the DCT transformation matrix, which works just like the Walsh transformation matrix. Note that these terms depend on u and v, x and y but not on the signal. It's all out front in Syx. So in essence, the transform matrix is being multiplied by the data matrix.
Key idea: The two dimensional matrix which the authors refer to as Svu, is the spatial-frequency representation of the corresponding image. Its DC component is called S00 and is found in the upper left corner, etc.
Differential coding. This rather simple idea says that you store the difference between two succesive samples, rather than the sample values. It's almost the equivalent of throwing away the DC component. If a value stays the same for a long time, the differential code would be zero and could be run length encoded. But of course the repeated value could also be RL encoded, couldn't it?
A particularly cheap trick is to only use about 4 bits (or sometime just one bit) to represent the amount of change between samples. This can produce a coarse effect. Also, the truth may drift on you. Adding up hundreds of approximate changes, won't keep your net value close to the truth. So you have to transmit the actual signal value periodically to reset the process.
Now for JPEG.
Query 5.8: ON page 130, in the list of requirements for JPEG, the expression "compression factor" is used. What does it mean?
Remind me to discuss the RGB and YUV color systems in class. When you understand those ideas, you can then understand the sentence on page 133 where it says "For JPEG, YUV color image processing uses Y1=4Y2=4Y3", etc. In essence it's saying that we have to put more data into intensity variations than we do into color because the human eye can distinguish smaller variations in intensity, than in color.
Interleaving of components. Should we send all the red in an image, then all the green, then all the blue? That would make it hard to do "on the fly" reconstruction, so we normally send the R G and B of a given small screen area, then repeat our way across the image. This is called ocomponent interleaving.
Image Processing. We're studying the lossy baseline JPEG here. (There are three other kinds.) 8 x 8 blocks of data are transformed by the DCT into 8 x 8 coefficient arrays. The data in the upper left quadrant is the most important data, because it's about the whole picture. But all the data is necessary to achieve near-lossless reproduction. (It's still lossy because real numbers lose accuracy in rounding.)
Quantization (p. 139) is confusing but important. In essence, for each of the 64 cells in a transform, the question is asked: "how important is this data?" A default table of weights is provided for JPEG. It is NOT symmetrical, by the way.It reflects the fact that most photographs taken by humans have strong horizontal structure to them, and so assumes more redundancy (i.e. assumes greater energy in the low frequency components) in the horizontal direction. Highly sophisticaded JPEG users can also make up their own quantization tables.
So.. the cell for S44 might contain the number 30. This means, "divide the value in this cell by 30 before storing it." Or, equivalently: "30 units of energy at frequency S44 is worth one point. Sixty units is worth 2 points, etc.
If you think about it, the net effect is to group similar but not identical features into a single level. A small quantization number means "store lots of detail at this frequency" and a large one says "ignore this component unless it's really large."
Since the higher frequencies are quantized into fewer levels, a lot of them in fact become zeroes. The next step is to put them into a list, according to the zig zag sequence chart on page 141. All those zeroes tend to fall toward the end of the list. The zeroes are run length encoded. Then Huffman encoding crunches this data very nicely.
That seems like enough information for one day, don't you think? We'll continue from page 142 on Thursday.
JMM