This lecture is based on Chapter 2 of the Wu text. It explores the basic issues of analog and digital signals and lossless and lossy transmission of information. It sets up some basic ideas about multimedia computers for detailed exploration in later chapters.
This lecture covers material from the beginning of Chapter 2 of Wu,
down through 2.1.5 (Telephony.) The next lecture from Wu will concentrate
on video (moving images) and the rest of Chapter 2.
Until 600 BC the only way to pay for anything was by barter. You bartered some number of ounces of silver for a sheep, or some number of sheep for a wife. A king named Croesus in Lydia (now part of Turkey) is credited with inventing coins, so you could COUNT money rather than WEIGHING it. An analog operation (one in which an infinite range of possibilities exists between any two fixed values, like 0.0 and 1.0) was replaced by a discrete operation (one in which all measurements correspond to integers.)
Electronically we do the same thing. It's the DIGITAL in Digital Media. Voltages or currents are analog values, which can range between maybe 0.0 and 3.7 volts. If you use voltage to represent numbers, you get an analog computer. It's the electronic equivalent of a slide rule, which you may have heard of in Ancient History. Typical accuracy was around 1% on good days or 5% when changing temperatures started changing all the resistances in the circuits. The more circuits an analog signal goes through, the more noise or error it accumulates. Consider a copy-of-a-copy-of... a videotape. Pretty soon it's all snow.
We can represent analog signals in magnetic media as well as in electronic circuits. The tiny magnets on a cassette tape or a diskette or a hard drive are reoriented by hitting them with a magnetic field. The result is that when the magnetic medium moves past a coil of wire, it induces a voltage bump - whose height is proportional to the magnetization, and thus proportional to the voltage when the stuff was recorded. Plus or minus some analog noise, as always.
A key decision made in about 1948 was to use a kind of round-up round-down technology to "discretize" computing. That is, any voltage over (for instance) 2.5 is taken as a logical "1" (TRUE), and rounded back up to full value (like 3.7 volts.) Any voltage below 2.5 is rounded down to 0. So this elementary error correction process can preserve and restore the logical contents of a signal, even through thousands of stages of replication.
Digital lasts. Analog doesn't.
AM and FM.
Final topic about analog signals. Sometimes the best way to encode data is not with the voltage level at all (which was called amplitude modulation), because the medium may not actually like sustained voltages. For instance the POTS (Plain Old Telephone System) was designed to pay attention to changes in value rather than value. So, sometimes we use a system called frequency modulation. In this case we use one tone (a whistling sort of sound, a sine wave) of a particular frequency to represent 0, and another to represent 1. This is the standard practice with modems, and is also used to record the audio portion of a broadcast TV show. FM has better noise resistance than AM, because a spike of noise doesn't really (usually) change the ratio of the amount of energy in the two bands which the receiver is listening for, to make 1 or 0 happen.
BUT digital isn't perfect, either. Analog noise, being random, will sometimes rise up like a rogue wave at sea and zap one bit in a message. So we have to use checksums in various places in our storage of digital information. Consider 8 bits like 10111100. Count the number of one-bits (there are five of them.) If the result is ODD, add a ninth digit which is zero. If the result is EVEN, add a ninth digit which is a one. This is called an odd parity bit.
If you did a whole word at once, you'd have an odd parity checksum. There are actually many more tricks to this subject of error detecting and error correcting codes; we'll come back to this topic in later lectures.
What's a non-Multimedia Computer?
Imagine a plain old computer with a 486 processor in it. There exist commands to set and clear individual pixels in its SVGA screen, at resolutions of 640 x 480, 800 x 600 and 1024 x 780, in various color depths. There is a speaker built into the computer whose state can be controlled by an 8 bit digital to audio controller (DAC.) How much media can you do with this machine?
Answer: a lot, if you're not in a hurry. You can set the audio value by successive writings to the DAC and produce anything that a .WAV file can store. You can bring up beautiful imagery in pixels, just by dumping disk files to the video buffer. But it's all SLOOOOW.
This chapter is about what goodies have been added to computers in the last 4 years to speed up these processes.
And the opposite?
When CD-ROM arrived, a flood of data was available for the first time and needed displaying. The multimedia computer was born about 3 years ago. Text defines MM computer as
- a computer which suports playing, editing and recording multimedia
including digital audio and video;
- a computer which supports communication via these media via phone
or network (e. g. ethernet)
To control such things, standard means of starting and stopping programs, synchronizing sound and video, and of compressing and decompressing stuff are needed.
MPC Level 3 specification, at www.spa.org/mpc/mpc3spec.htm which alleges to tell you what you need in a multimedia pc, if it's 1996. Things have changed, as they always do, but that PC with its 75 mhz CPU was what it took to deliver CD-ROM games like Myst. What's new? Mostly faster CPUs and 3d graphics cards like Voodoo, Real 3D and Diamond.
Query 3.1 - and this is a big one. Read the spec for mpc3, at the above web site, and pick one area (like CD, or audio). Find out what all the acronyms mean - become an "expert" on this one section of the spec. Look stuff up if you don't know. (How do you look stuff up? The library will teach you if you don't know.) We'll give you a chance to flaunt your knowledge at some point in the course, like maybe a midterm exam.
Or, you can think of your checklist of things to learn as a mission statement for part of this course. Snag ideas and facts as they go by, and fit them into your checklist of missing ideas. Then do research before the midterm to fill the rest in.
Multimedia Software. Your Windows machine will reveal its media
secrets to you if you open up the Multimedia Control Panel. You will see
some of the things spoken of, in figure 2.2.
Mammals' eyes have red, green and blue-sensitive sensors called CONES, as well as a family of cute little blue-sensitive sensors called RODS. The rods work mostly at night (ever notice how everything is blue at night?) A cone sends out signals like beep --- beep --- beep --- and if light of a color it likes is shining on it, the beeps come more frequently --beep--beep--beep until, at saturation the brain is getting beepbeepbeepbeep on that color channel, at that spot in the retina.
Now the world isn't really composed of red, green and blue stuff. Light has a wavelength (or a mix of wavelengths). Red cones are really "maximal-response-to-red" cones, with lesser but nonzero responses to neighboring wavelengths. Ditto blue and green cones. So the brain figures out a color like yellow, because it stimulates the red and green cones about equally. That's how we get millions of colors from three sensors - a given color is defined by the ratio of beepbeeps coming in on three sensor systems.
Even though the eye is composed of discrete (separate) cones and rods, they're so close together that we perceive the world as a continuously varying sequence of colors. Photographic film is the same - it uses three colors of dyes, formed into crystals too small to see them individually. So, let's define natural images as consisting of a continuously varying sequence of colors. For any (x,y) in the scene, you can find an appropriate (RGB) value.
Electronic images such as TV or computer or newspaper images, are composed of colored dots. The dots can be considered as SAMPLES from a continuous natural image. However, TV and computer images are actually not the same. The TV image is composed of dots but the signal that painted the dots, was not broken up into individual color representations (numbers.) It was an ANALOG signal - think of a garden hose, being turned up and down by a maniacal kid at the faucet while Dad tries to water the carrot patch, row by row. Some rows get lots, some get nothing. Next month, some carrots come up, some don't. (This family lives in Las Vegas, where it rains maybe once a year.) That carrot patch is your TV screen, and the carrots are called color triads (red green blue.) But the water in the hose didn't know about the carrots. It was a continuously varying stream.
Making images out of a string of separate numbers, e. g. a computer screen, offers possibilities beyond what analog video can do. Analog video can never change instantly from one color to another, so it's always a bit fuzzy. But discrete video certainly can. So computer screens can be super-duper sharp. This, however, is actually a disadvantage for some uses, such as showing off photos. It sometimes produces "jaggies" or aliasing, where discretization has chopped a continuous image into separate steps of brightness. The eye is very good at finding such things, just when we wish it wasn't.
There are three levels of representation we wish to discuss:
analog, or continuous image
discrete, or digital image
geometric object
So let's discuss the third option.
Geometric Objects are the mainstay of computer graphics courses such as CAP5725 - Introduction to Computer Graphics, or CAP4021 - Building Virtual Worlds. Geometric objects are composed of vertices (points in 3d space), represented by triples of real numbers, and made of polygons (triangles, squares, etc.) The process of transforming geometric objects into an image is called rendering.
Discrete Images. Computer screens nowadays usually have the potential to represent 256 different levels of red, green and blue in each color triad. This takes 3 bytes of data. A fourth byte is often used to represent something called Alpha, which stands for transparency and is used when images are mixed.
An image of 1280 x 1024 x 3 bytes would yield about 3 megabytes of data. Not something one wants to send over the Internet, so we have got to seek ways of packing it considerably. One of the oldest ways is called the Color Palette. Actually this arose long before the Web; at that time it was disk space that was the killer constraint.
VGA and Color Palettes. Figure 2.3 in the Wu text explains how 8 bit color is used to select a 24 bit color from a lookup table called the Palette, which is stored with each 8 bit image. In class I will explain how palettes are automatically built when a 24 bit or 32 bit color image is converted to 8 bit image.
Query 3.2: Explain how palettes are automatically built when a 24 bit or 32 bit color image is converted to 8 bit image. In other words, were you paying attention in class?
We'll say much more about video imagery in a while. For now, let's switch to audio.
Humans can hear sounds from about 20 herz (pulses per second) up to 20,000, when they're 6 years old. After that it starts going to pot at both ends (but mostly the high end) and when you're a teenager if you listen to rock music. If you record these pressure variations with a microphone and an analog medium (like a wax disk, or a magnetic tape recorder), you have a continuous sound image - just like the continuous visual image in a photograph. (In both cases it's really discrete, down at the magnetic grains in the magtape or the silver grains in the film. But once the pieces are too small to perceive and their contributions are formed by analog addition (one bit is worth as much as any other), you have an analog (continuous) signal.
Audio Objects could be represented many different ways; consider the following symbols:
"This is a string of sounds in the English language."
Your mouth renders them into an analog audio signal. Another set of audio object symbols are represented by musical notation. There is an electronic form of musical notation called MIDI (Musical Instrument Digital Interface). So we have a parallel between audio and visual information:
Objects ---> Digital Signals ---> Analog Signals
Geometry
-----> Digital Image ---> Analog image
MIDI, writing,notation ----> Digital Sound ---> Analog sound
Generally, to reverse one of these arrows involves the potential for
either substantial difficulty, or loss of information. Conversion of analog
images to digital images, or of analog sound to digital sound, requires
sampling. Conversion of digital images or sounds into objects requires
much more than that. In the case of visual information, that pathway is
called machine vision. In the case of audio, it might be called
speech recognition if we're working with human speech, or some specialized
term I don't know if we're trying to convert digital sound, to MIDI.
Query 3.3. Describe some of the problems associated with converting a CD recording of, say, a woodwind quartet, into a MIDI file for that performance.
Discrete Sound. Continuous signals of any type can be sampled in such a way that they can be put back together again. This is based on the Nyquist sampling theorem, which says that samples must be taken at a rate at least 2 times the maximum frequency of the signal. For audio to be realistically reproduced for 6 year olds, the sampling rate must be at least 40,000 samples/second, or 40 Kiloherz (Khz). The standard that was set for CD was 44 Khz.
The above figure 2.4 from the Wu text shows that there are several
pathways from the PC's internal storage (using a .WAV file) to the outside
world, including MIDI, Speaker and Microphone, and CD.
WAV files are discrete representations of continuous sounds. The "Pulse Code Modulation" (PCM) technique is used. A CODEC is a specific set of software (and sometimes hardware) which embodies a particular set of rules for encoding data.
Telephony is the most mature part of audio engineering, because telephones (together with radio) were the driving force behind the invention of much of acoustical science. Phonographic recording was almost a byproduct, since it was done mechanically until vacuum tubes came along from radio to help out.
Telephones used analog audio almost exclusively until the 1970's, when digital systems started appearing. The integrated circuit DAC (Digital to analog converter) and ADC (opposite, of course) made digital telephones practical. Even today they're mostly used in PBX (Private Branch Exchange) setups, like UCF or a large business. All the long distance traffic has been digital for more than 30 years.
TAPI is the way that software connects various kinds of telephonic service together. Read all about it in the book...
Query 3.4: I take my digital camera out and I shoot a picture of my granddaughter Emily. I put the image (as a JPEG file) on a diskette, slip it into my PC, read the file onto my hard drive, then send e-mail to Emily's other grandfather in England. I attach the image. I use a 56kb modem over a POTS phone line to my internet service provider. His system uses a fiber optic T3 link to the Internet backbone run by Sprint, who transmits the image via fibre to England, and ultimately to my fellow grandpa's ISP. He pulls the image down over his modem. He opens his e-mail and Emily's face appears on the screen.
Trace every step of this process, reporting the data formats that were involved.Which steps were analog and which were digital? Which if any steps were lossy and which were lossless?
Back to previous lecture
Forward to next lecture
Back to the Index
Back to the Syllabus