About MPEG Compression
MPEG encoding is based on eliminating redundant video information, not only within
a frame but over a period of time. In a shot where there is little motion, such as an
interview, most of the video content does not change from frame to frame, and MPEG
encoding can compress the video by a huge ratio with little or no perceptible quality loss.
MPEG compression reduces video data rates in two ways:
 Spatial (intraframe) compression: Compresses individual frames.
 Temporal (interframe) compression: Compresses groups of frames together by
eliminating redundant visual data across multiple frames.
Intraframe Compression
Within a single frame, areas of similar color and texture can be coded with fewer bits
than the original, thus reducing the data rate with minimal loss in noticeable visual
quality. JPEG compression works in a similar way to compress still images. Intraframe
compression is used to create standalone video frames called I-frames (short
for intraframe).
394
Part V
Appendixes
Interframe Compression
Instead of storing complete frames, temporal compression stores only what has
changed from one frame to the next, which dramatically reduces the amount of data
that needs to be stored while still achieving high-quality images.
Groups of Pictures
MPEG formats use three types of compressed frames, organized in a group of pictures,
or GOP, to achieve interframe compression:
 I-frames: Intra (I) frames, also known as reference or key frames, contain all the necessary
data to re-create a complete image. An I-frame stands by itself without requiring data
from other frames in the GOP. Every GOP contains one I-frame, although it does not
have to be the first frame of the GOP. I-frames are the largest type of MPEG frame, but
they are faster to decompress than other kinds of MPEG frames.
 P-frames: Predicted (P) frames are encoded from a “predicted” picture based on the
closest preceding I- or P-frame. P-frames are also known as reference frames, because
neighboring B- and P-frames can refer to them. P-frames are typically much smaller
than I-frames.
 B-frames: Bi-directional (B) frames are encoded based on an interpolation from I- and
P-frames that come before and after them. B-frames require very little space, but
they can take longer to decompress because they are reliant on frames that may be
reliant on other frames. A GOP can begin with a B-frame, but it cannot end with one.
GOPs are defined by three factors: their pattern of I-, P-, and B-frames, their length, and
whether the GOP is “open” or “closed.”
GOP Pattern
A GOP pattern is defined by the ratio of P- to B-frames within a GOP. Common
patterns used for DVD are IBP and IBBP. All three frame types do not have to be used
in a pattern. For example, an IP pattern can be used. IBP and IBBP GOP patterns, in
conjunction with longer GOP lengths, encode video very efficiently. Smaller GOP
patterns with shorter GOP lengths work better with video that has quick movements,
but they don’t compress the data rate as much.
Some encoders can force I-frames to be added sporadically throughout a stream’s GOPs.
These I-frames can be placed manually during editing or automatically by an encoder
detecting abrupt visual changes such as cuts, transitions, and fast camera movements.
Appendix A
Video Formats
395
V
GOP Length
Longer GOP lengths encode video more efficiently by reducing the number of I-frames
but are less desirable during short-duration effects such as fast transitions or quick
camera pans. MPEG video may be classified as long-GOP or short-GOP. The term
long-GOP refers to the fact that several P- and B-frames are used between I-frame
intervals. At the other end of the spectrum, short-GOP MPEG is synonymous with
I-frame–only MPEG. Formats such as IMX use I-frame–only MPEG-2, which reduces
temporal artifacts and improves editing performance. However, I-frame–only formats
have a significantly higher data rate because each frame must store enough data to be
completely self-contained. Therefore, although the decoding demands on your
computer are decreased, there is a greater demand for scratch disk speed and capacity.
Maximum GOP length depends on the specifications of the playback device. The
minimum GOP length depends on the GOP pattern. For example, an IP pattern can
have a length as short as two frames.
Here are several examples of GOP length used in common MPEG formats:
 MPEG-2 for DVD: Maximum GOP length is 18 frames for NTSC or 15 frames for PAL.
These GOP lengths can be doubled for progressive footage.
 1080-line HDV: Uses a long-GOP structure that is 15 frames in length.
 720-line HDV: Uses a six-frame GOP structure.
 IMX: Uses only I-frames.
Open and Closed GOPs
An open GOP allows the B-frames from one GOP to refer to an I- or P-frame in an
adjacent GOP. Open GOPs are very efficient but cannot be used for features such as
multiplexed multi-angle DVD video. A closed GOP format uses only self-contained
GOPs that do not rely on frames outside the GOP.
396
Part V
Appendixes
The same GOP pattern can produce different results when used with an open or closed
GOP. For example, a closed GOP would start an IBBP pattern with an I-frame, whereas
an open GOP with the same pattern might start with a B-frame. In this example,
starting with a B-frame is a little more efficient because starting with an I-frame means
that an extra P-frame must be added to the end (a GOP cannot end with a B-frame).
MPEG Containers and Streams
MPEG video and audio data are packaged into discrete data containers known as streams.
Keeping video and audio streams discrete makes it possible for playback applications to
easily switch between streams on the fly. For example, DVDs that use MPEG-2 video can
switch between multiple audio tracks and video angles as the DVD plays.
Each MPEG standard has variations, but in general, MPEG formats support two basic
kinds of streams:
 Elementary streams: These are individual video and audio data streams.
 System streams: These streams combine, or multiplex, video and audio elementary
streams together. They are also known as multiplexed streams. To play back these
streams, applications must be able to demultiplex the streams back into their
elementary streams. Some applications only have the ability to play elementary streams.
:06 :07 :08 :09
:09
:10 :11 :12 :13 :14 :15 :16 :17 :18 :19 :20 :21
:04 :05
Open GOP
(IBBP, 15 frames)
I B B P B B P B B P B B P B B I
B B
P
:03
:06 :07 :08
:10 :11 :12 :13 :14 :15 :16 :17 :18
:04 :05
Closed GOP
(IBBP, 15 frames)
P B B P B
P B
P B P
I B
B
B
B
Appendix A
Video Formats
397
V