This set of notes will mostly consist of queries rather than summaries,
due to time pressure. Read the chapter! Take notes during the
lecture! Because I'm late giving you these queries, your answers
aren't due until next Tuesday.
Queries: General Audio
2.1 Explain the relationship between frequency, wavelength (period)
and the speed of sound. If you double the frequency of a sound,
do you double or halve its wavelength?
2.2. Why was the sampling rate of 44,100 herz (samples per second)
chosen as the standard for CD audio recording?
2.3. Quantization with six bits per sample would result in how
many distinct amplitude levels?
An important concept: Physical and Logical Representations.
Any audio signal (whether a sine, triangle or square wave or
some irregular but periodic wave form) with fundamental frequency
of 440 herz is called the musical note "A-440", but
in fact the different waveforms would sound very differently.
The concept of a musical pitch is a single logical element, with
several different physical realizations.
MIDI (Musical Instrument Digital Interface) is a logical representation
of music, as is also the notation used in sheet music. In fact
you can think of MIDI as "electronic sheet music." Wave
files (often called .wav files from their DOS file extension)
represent the physical representation of specific sounds, in specific
voices. A MIDI file can be played back using any voice
that the synthesizer has available, just as sheet music can be
performed by a trumpet, flute or piano.
Queries: MIDI
2.4. Define a musical event, in a MIDI sense. (A definition is not an example!)
2.5. MIDI is transmitted on a serial link rather than a parallel one. What does this tell you about the number of wires in a MIDI cable?
2.6. Is MIDI data asynchronous, synchronous or isochronous? (I believe that you won't find this answer in the book. You have to THINK about it.)
2.7. What is a voice, in MIDI terminology?
2.8. Could MIDI be used to represent human speech?
2.9. Why is MIDI far more compactly stored than an equivalent
.wav file?
Queries: Speech Synthesis
2.9. Human speech can be physically described as quasi-stationary, and as consisting of formants. Explain these concepts in your own terms. Illustrate these ideas by describing the difference between the pronunciation (shape of the mouth) and sound (actual audio signal produced) by speaking the letter names O, E and S.
2.10. Define phoneme (authors call it a phone), and a morpheme (authors call it a morph). (The translator didn't get it right in coming from German to English.)
2.11. Actual sounds of speech vary greatly between individuals;
James Earl Jones doesn't sound like Pee Wee Herman. Make an analogy
between MIDI events and .wav files, and elements of language.
What linguistic elements correspond to logical, and what to physical,
representations?
Queries: Speech Analysis and Recognition
2.12. The authors assert on page 48 that if the probability of recognizing each word correctly is 95%, then the probability of recognizing a three word sentence correctly is (0.95)^3 = 0.857. Criticise this conclusion if you can. (If you can't, work on it!)
2.13. Define syntax, and give one example of a syntactic rule in English.
2.14. What would be the payoffs and problems associated with a
recognition/synthesis system for compressing and transmitting
speech?
Whew! That's a pretty serious set of queries for one lecture,
folks. They won't all be this intensive, but since you don't have
any programming to do yet, I thought I'd put you to work over
the weekend. Don't all thank me at once....