Some Video Details
VHS is a physical tape format. It can be recorded using the NTSC, PAL or SECAM techniques, but they all use the same kinds of actual tape and mechanical parts. You will sometimes see the authors refer to VHS as a way of covering all 3 techniques in one sentence.
Wrapping up JPEG
JPEG comes in six main flavors.
Flat (non-hierarchical):
The lossless JPEG compression replaces DCT (a source of much roundoff and quantization error) with a kind of differential technique, based on the idea of encoding one of eight possible kinds of relationships between already transmitted pixels, and the new one. If the new one's name is X, we have a pattern like this:
C B
A X
Pixels C, B and A can be used to predict a value for X by eight methods. Here are 3 of them. All these methods assume that a sequence of bits which we could call EF is transmitted. Both E and F will consist of multiple bits, and the whole thing will be entropy encoded afterward.
Three possible methods:
0: Ignore A, B, C. Just store X's difference from previous value of
X, in variable F.
1: X = A + F
...
7: X = ((A+B)/2) + F
Since F is often near zero, it can be represented with just a few bits. So we may be able to get a whole new pixel value for (average) of 2 bits for the code E, and 2 or 3 bits for F.
Attacking MPEG
(We won't discuss H.261 except to say that it was developed for teleconferencing and video telephones, and MPEG has largely replaced it.)
MPEG's goals are to take advantage of temporal redundency (similarities between successive frames of a video image), and to provide methods which support both symmetrical and assymetrical coding and decoding. Symmetrical would mean that both recording and playback time requirements are important, whereas assymetrical means that you can spend lots of time compressing the image, to speed up the playback. The discussion in the text does not explain how symmetrical processing is possible; but in fact it requires specialized hardware. No CPU now in existence can encode MPEG video in real-time, without special hardware help.
The text is fairly clear on the following topic, so I'll just ask you to explain it to me.
Query 6.1: Explain the roles of I, P and B frames.
The hardest idea in MPEG is the motion detection required to compute P (predictive) frames. They're not really predictive, because at the time you encode them, you know exactly what the original data is. But at playback time you don't have that data, of course.
A key idea is to realize that 8 x 8 macro blocks (in chroma, = 16 x 16 in luminance) are quite small, in a 640 x 480 image. Thus, if a car is moving across a scene, a macro block might represent a portion of the driver's door.
So, to compute a P frame, the encoder will look at two source images: the current frame C of raw data, and the previous I-frame. Considering a particular macro block in C, the system examines the corresponding macro block and a few adjacent ones, in I. The one which is most similar (by subtraction and computing an average across all the pixels) is designated as the "base", and a motion vector like (1,0) is used to represent the mapping from base to P-frame.
Now, this mapping is not exact, and so we still have to store a correction frame. But since the differences are small and likely to affect the whole block, the DCT of this difference will be almost all zeroes. It'll compress very nicely.
B frames are just "between" frames, computed by a kind of averaging from I and P frames. They use motion interpolation both forward and backward, whichever gives the best result. So obviously when decoding, you have to expand all the P frames from one I to the next, before you can constructthe B frames in between. B frames are "dumb and fast."
Query 6.2: The "motion" encoded in MPEG only maps macro blocks into other macro blocks. But the world doesn't actually move in convenient 16 pixel jumps. How can such a technique possibly work out in practice?
MPEG-2 is a refinement of MPEG-1. No major new techniques are introduced, but a variety of levels of service are provided. Due to various improvements, MPEG-2 achieves higher compression ratios than MPEG-1, but I believe the improvement is at most 2:1. Anyone with other data should let me know, and I'll check other sources before Thursday's class.
Data Rates
The text overwhelms us with information about data rates, but leaves us confused. For instance, on page 173 in the sumamary, the discussion is about VHS with a data rate of 1.2 mb/sec, but it doesn't mention that this is the rate for the compressed rather than the uncompressed data. Let's figure it out for ourselves.
It's tricky to try to decide how many bits of information are really needed to store an analog video image. Let's try, approximately. Remember that a previous query (which was mis-typed in these notes!) asked about the number of bytes in a 640 x 480 image with 8 bits per pixel. Answer= 640 x 480 = 307,200 bytes. Of course if it were a 24 bit image, this would be about 0.9 megabyte. If we wanted 60 frames a second, that's about 54 megabytes/second (lots of data!)
On the other hand, we know that NTSC composite video is transmitted in about 4.5 mHz of radio bandwidth, which (if we assume sampling at 8 bit resolution) could be stored with about 9 mb/second of sampling. There is a big gap between 54 mb/sec and 9 mb/sec. What's up?
Well, the analog image actually isn't nearly as good as 640 x 480. The two successive fields are nearly identical, so you have an effective vertical resolution of about 260 lines. The color is not changing nearly as fast across a horizontal line as the luminance, so the 640 x 480 image's "24 bits of color per pixel" is much richer in detail than the NTSC image.
MPEG-1 was designed to work with CD back when it was single-speed, and could deliver at most 1.5 mb/sec. Thus, if it could achieve a compression ratio of 9/1.5 or about 6:1, it would succeed. Indeed it did... but when you play back MPEG-1 from a single speed CD, you only get NTSC quality. To render this on a PC screen, you use a little bitty window of about 320 x 240 pixels. Of course if you used special hardware to blow it up and put it on a TV, it would look like a TV signal.
Query 6.3: How long a movie could you show, at NTSC quality, using a 650 mbyte CD and an assumed MPEG-1 compression ratio of 6:1?
Query 6.4: Assume Mpeg-2 could support a 12:1 compression ratio. A DVD disk can store 17 GB of data, and is being used to deliver full two-hour movies. How many pixels per frame (at 30 frames per second) will be available, in decompressed video? Will this look better, worse or about the same as NTSC video?
DVI: Skip it. It's an obsolete technology. We'll hear a nice
student-lecture on DVD, which is the New Thing, next Tuesday.