5-2 CODING
OVERVIEW
Introduction to Audio Coding
Audio takes up a lot of data.
Without data reduction, CD-quality quality audio — 16 bits at 44.1kHz sample rate —
requires a transmission capacity of about 705 thousand bits per second (kbps) for each
audio channel. But the wires we use for remote broadcasting are on a telephone system
designed for voice-grade communications: 8 bits at 7kHz sample rate, or 56kbps per
channel. That’s less than 8% of what we need.
?
CURIOSITY NOTE
You can arrive at these same numbers with nothing more complicated than
grade-school math. Just multiply the sample rate by the sample depth: 44,100
samples per second * 16 bits per sample = 705,600 bits per second for CD-quality
mono audio.
You can reduce the data requirements by lowering the quality somewhat. 13 bits would
yield a respectable 78 dB dynamic range, certainly adequate for home listening. And a
32kHz sample rate — with careful equipment design — will give you flat response to
15kHz, the practical limit for analog FM broadcasting in North America. Unfortunately,
that still leaves us with telephone data channels about 86% too small to do the job.
Besides, 13 bits is an awkward bit depth for computers to deal with, and the audio it
produces isn’t clean enough to survive today’s transmitter processors.
?
CURIOSITY NOTE
Bit depth and sample rate translate easily into audio specifications. Digital audio
must have a sample rate of at least twice the desired bandwidth, so 15kHz audio
requires (after a safety margin) 32kHz sampling. Each bit of sample depth
represents slightly more than 6dB of dynamic range.
The first practical coding methods used a principle called ADPCM, Adaptive Delta Pulse
Code Modulation. This takes advantage of the fact that it takes fewer bits to code the
difference, or delta, between successive audio samples compared to using the individual
values. Further efficiency is had by adaptively varying the difference comparitor
according to the nature of the program material. G.722 and APT-X are examples of
ADPCM schemes. They achieve around a factor of 4 reduction in bitrate.
G.722 achieves additional efficiency by allocating its bits to match the patterns in the
human voice, and it’s considered adequate for news and talk programming over ISDN.
But for high-fidelity transmission, algorithms with more power are required. These are
based on psychoacoustics, where the coding process is adapted to the way we hear
sounds. There are several algorithms available, with varying complexity and
performance levels.
Some years ago, the international standards group ISO/IEC established the ISO/MPEG
(Moving Pictures Expert Group), to develop a universal standard for encoding moving
pictures and sound for digital storage and transmission media. The standard was