IP Television 10.1 2014 | Page 28

veex_veex 06/03/2014 20:50 Page 1 COMPANY CONTRIBUTION Eve Danel, senior product manager, VeEX Inc, explains it is vitally important to monitor quality of experience in the age of IP-delivered video. PTV delivers video services via the packet switched network using IP protocol. Therefore, it is subject to a completely new set of challenges and impairments compared to traditional Cable or Satellite broadcast television, whether it is a live TV event broadcast via IP multicast streaming or a stored Video on Demand service delivered via an IP unicast stream. While IPTV providers generally deliver the content over their own managed network, the broadband ‘last mile’ connection to the home, as well as the stream decoding in the customer’s set-top box (STB), are all subject to I transformed so that only the corresponding quantised coefficients can be transmitted through the network. On the receiving end, the original block can be recreated by applying the inverse transformation, although the quantisation step can create subtle loss of colour shades or brightness. This process is called lossy compression, because the recreated image is not the same as the original GOP Frame Error Propagation other frames. The P-Frame, or Predicted Frame, contains only the changes from the preceding frame and uses delta encoding. The P-Frame is predicted from the closest I or PFrame. The B-Frame, or Interpolated Frame, is a bi-directionally predictive picture. It uses Is Your IPTV Service Picture Perfect? IP and physical layer related impairments that can greatly degrade the customer experience. Issues such as video encoding quality, bandwidth availability, packet loss and jitter all play a role in the viewer’s experience. While it is difficult to have viewers ‘rate’ their quality of experience in real time, test equipment with dedicated algorithms are able to measure and report QoE as experienced by the viewer. A short background on video encoding and transport will help to understand the different quality metrics and why it is so important to monitor quality. Background on video encoding SD and HD video streams are compressed to reduce the amount of bandwidth necessary for transport. The compression techniques take advantage of the fact that there are a lot of temporal and spatial redundancies in a video signal. To compress pictures, spatial redundancy compression takes advantage of the fact that neighbouring pixels are alike, similar to the way JPEG compression works. IPTV video codecs like MPEG-2 and H.264 (MPEG-4) use a spatial compression algorithm that divides the picture into blocks. Pixel data (luminance and chrominance) in each block is then 28 IP television image. Various encoding and quantisation techniques can produce different results, as there is a trade-off between image quality and bandwidth efficiency. Impairments generated by spatial compression are perceived by the viewer as large visible blocks. Temporal redundancy compression takes advantage of the fact that adjacent video frames are very similar, especially if the scenes have slow movements. Therefore, compression of sequences of frames can be achieved by only storing the differences between them. Certain frames are designated as ‘reference’ frames. In between the reference frames, only ‘difference’ frames are transmitted. The difference frames only store the changes, such as the elements in motion between the current frame and the preceding frame or most recent frame. Video codecs use three types of frames for temporal compression called the I, P and BFrames. The I-Frame, or Intra Frame, is a full picture information, with only spatial compression applied (as described previously). This frame is independent from other frames and can be decoded without information of preceding or following I or PFrames for its encoding. The B-Frame takes the least amount of bandwidth. The I, P, and B-Frames are assembled in a Group of Moving Pictures or GOP. The GOP starts with an I-Frame and ends with an IFrame. A mix of P and B-Frames are inserted between the consecutive I-Frames. The number and order of frames in the GOP can vary, but generally the I-Frames are separated by about ½ second, with a GOP size of 12 to 15 frames. While being a bandwidth efficient encoding mechanism, temporal encoding opens the door to a snowball effect for individual frames errors. If an error or loss happens in an IFrame, all the remaining pictures in the GOP will be affected which could mean ½ second or more of errors. While an error to a P-Frame will propagate to all the remaining P and BFrames, and an error in a B-Frame is selfcontained to that frame. Understanding the GOP structure is a key element to understanding why monitoring the packet network impairments alone is not a good predictor of video quality, since all the frames do not carry the same importance in the decoding process. The I, P, and B-Frames are then encapsulated to be carried over the packet