CODING 5-3
finalized in November 1992 with three related algorithms, called Layers, defined to take
advantage of psychoacoustic effects when coding audio. Layer 1 and 2 are intended for
compression factors of about 4:1 and 6 or 8:1 respectively, and these algorithms have
become popular in satellite and hard-disk systems. Layer 3 achieves compression up to
12.5:1 — 8% of the original size — making it ideal for ISDN.
Basic Principles of Perceptual Coding
With perceptual coding, only information that can be perceived by the human auditory
system is retained.
Lossless – which, for audio, translates to noiseless – coding with perfect reconstruction
would be an optimum system, since no information would be lost or altered. It might
seem that lossless, redundancy-reducing methods (such as PKZIP, Stuffit, and others
used for computer hard-disk compression) would be applicable to audio. Unfortunately,
no constant compression rate is possible due to signal-dependent variations in
redundancy: There are highly redundant signals like constant sine tones (where the only
information necessary is the frequency, phase, amplitude, and duration of the tone),
while other signals, such as those which approach broadband noise, may be completely
unpredictable and contain no redundancy at all. Furthermore, looking for redundancy
can take time: while a popular song might have three choruses with identical audio data
that would need to be coded only once, you’d have to store and analyze the entire song in
order to find them. Any system intended for a real-time use over telephone channels
must have a consistent output rate and be able to accommodate the worst case, so
effective audio compression is impossible with redundancy reduction alone.
Fortunately, psychoacoustics permits a clever solution! Effects called “masking” have
been discovered in the human auditory system. These masking effects (which merely
prove that our brain is also doing the equivalent of coding) have been found to occur in
both the frequency and time domains and can be exploited for audio data reduction.
Most important for audio coding are the effects in the frequency domain. Research into
perception has revealed that a tone or narrow-band noise at a certain frequency inhibits
the audibility of other signals that fall below a threshold curve centered on a masking
signal.
The figure below shows two threshold of audibility curves. The lower one is the typical
frequency sensitivity of the human ear when presented with a single swept tone. When a
single, constant tone is added, the threshold of audibility changes, as shown in the upper
curve. The ear’s sensitivity to signals near the constant tone is greatly reduced. Tones that
were previously audible become “masked” in the presence of “masking tones,” in this
case, the one at 300 Hz.
All signals below the upper “threshold of audibility” curve, or Masking Threshold are
not audible, so we can drop them out or quantize them crudely with the least number of
bits. Any noise which results from crude quantization will not be audible if it occurs
below the threshold of masking. The masking depends upon the frequency, the level,
and the spectral distribution of both the masker and the masked sounds.