veex_veex 06/03/2014 20:50 Page 1
COMPANY CONTRIBUTION
Eve Danel, senior product
manager, VeEX Inc,
explains it is vitally
important to monitor
quality of experience in the
age of IP-delivered video.
PTV delivers video services via the
packet switched network using IP
protocol. Therefore, it is subject to
a completely new set of challenges and
impairments compared to traditional
Cable or Satellite broadcast television,
whether it is a live TV event broadcast
via IP multicast streaming or a stored
Video on Demand service delivered via
an IP unicast stream.
While IPTV providers generally deliver the
content over their own managed network, the
broadband ‘last mile’ connection to the home,
as well as the stream decoding in the
customer’s set-top box (STB), are all subject to
I
transformed so
that only the
corresponding
quantised
coefficients can be
transmitted
through the
network. On the
receiving end, the
original block can be recreated by applying the
inverse transformation, although the
quantisation step can create subtle loss of
colour shades or brightness. This process is
called lossy compression, because the
recreated image is not the same as the original
GOP Frame Error Propagation
other frames. The P-Frame, or Predicted
Frame, contains only the changes from the
preceding frame and uses delta encoding. The
P-Frame is predicted from the closest I or PFrame. The B-Frame, or Interpolated Frame,
is a bi-directionally predictive picture. It uses
Is Your IPTV Service Picture Perfect?
IP and physical layer related impairments that
can greatly degrade the customer experience.
Issues such as video encoding quality,
bandwidth availability, packet loss and jitter
all play a role in the viewer’s experience.
While it is difficult to have viewers ‘rate’ their
quality of experience in real time, test
equipment with dedicated algorithms are able
to measure and report QoE as experienced by
the viewer. A short background on video
encoding and
transport will
help to
understand the
different quality
metrics and why
it is so important to monitor quality.
Background on video encoding
SD and HD video streams are compressed to
reduce the amount of bandwidth necessary for
transport. The compression techniques take
advantage of the fact that there are a lot of
temporal and spatial redundancies in a video
signal.
To compress pictures, spatial redundancy
compression takes advantage of the fact that
neighbouring pixels are alike, similar to the
way JPEG compression works. IPTV video
codecs like MPEG-2 and H.264 (MPEG-4) use
a spatial compression algorithm that divides
the picture into blocks. Pixel data (luminance
and chrominance) in each block is then
28 IP television
image. Various encoding and quantisation
techniques can produce different results, as
there is a trade-off between image quality and
bandwidth efficiency. Impairments generated
by spatial compression are perceived by the
viewer as large visible blocks.
Temporal redundancy compression takes
advantage of the fact that adjacent video
frames are very similar, especially if the
scenes have slow movements. Therefore,
compression of sequences of frames can be
achieved by only storing the differences
between them. Certain frames are designated
as ‘reference’ frames. In between the reference
frames, only ‘difference’ frames are
transmitted. The difference frames only store
the changes, such as the elements in motion
between the current frame and the preceding
frame or most recent frame.
Video codecs use three types of frames for
temporal compression called the I, P and BFrames. The I-Frame, or Intra Frame, is a full
picture information, with only spatial
compression applied (as described
previously). This frame is independent from
other frames and can be decoded without
information of preceding or following I or PFrames for its encoding. The B-Frame takes
the least amount of bandwidth.
The I, P, and B-Frames are assembled in a
Group of Moving Pictures or GOP. The GOP
starts with an I-Frame and ends with an IFrame. A mix of P and B-Frames are inserted
between the consecutive I-Frames. The
number and order of frames in the GOP can
vary, but generally the I-Frames are separated
by about ½
second, with a
GOP size of 12 to
15 frames. While
being a bandwidth
efficient encoding
mechanism, temporal encoding opens the
door to a snowball effect for individual frames
errors. If an error or loss happens in an IFrame, all the remaining pictures in the GOP
will be affected which could mean ½ second
or more of errors. While an error to a P-Frame
will propagate to all the remaining P and BFrames, and an error in a B-Frame is selfcontained to that frame. Understanding the
GOP structure is a key element to
understanding why monitoring the packet
network impairments alone is not a good
predictor of video quality, since all the frames
do not carry the same importance in the
decoding process.
The I, P, and B-Frames are then
encapsulated to be carried over the packet