MPEG-TS is still used extensively, even in “modern” transport mechanisms such as SRT, RIST and Zixi. Why though?
There is a tendency in broadcast to reminisce about “the good old days” and it would be easy to put the existence of MPEG-TS in this category but there are sound technical reasons why MPEG-TS still exists, especially in B2B contribution workflows. It’s not always just for historical reasons. There have been numerous attempts to replace MPEG-TS but all have been flawed. This table summarises some of the key feature differences between common containers and protocols (which may have an underlying container like RTMP or no container like WebRTC)*:
|Common Clock for Audio and Video||✅||✅||❌||✅||✅|
|Legacy data transport||✅||❌||❌||❌||✅|
Comparison of Compressed Transport Formats/Protocols (*Yes, this table mixes containers and protocols but it’s presented for simplicity)
As the name suggests MPEG-Transport Streams are for transporting video signals from A to B, be that via IP networks, satellite, cable etc. These kinds of transport mechanisms all incur some form of packet loss or bitflips. A container or transport protocol needs to have a means to recover from transport errors quickly such that the viewer is disturbed as minimally as possible by a loss of a packet or a bitflip. For example, a single bit error should not lead to the loss of seconds of video. The container needs to provide a way for the decoder to recover quickly, and if done well, a viewer will barely notice. MPEG-TS has a very fast way of recovering using sync bytes and startcodes at the cost of a high container overhead. Whilst some of this overhead exists for error resilience, some of it also exists for historical reasons, in particular the TS packet size of 188-bytes was there to be compatible with old ATM networks.
In contrast, TCP based protocols such as RTMP don’t have such behaviour, but instead increase their latency to wait for packets to be recovered (the famous “buffering” swirl). This is problematic in itself in B2B contribution applications (see Defined Latency below).
One of the key requirements of a decoder in B2B contribution is to recover the source clock (i.e the rate at which raw video frames are generated at source). B2B contribution services may be operating for years on end and so it’s important for a decoder to be able to recover the clock of the source in order to know when to drop or duplicate video frames cleanly and with as little viewer disruption as possible. If a decoder does this badly it will eventually buffer too many frames and run out of memory, or advance too quickly and run out of data to process. In contrast, this is unlikely to be a problem for B2C streaming as viewers are unlikely to watch these streams for extended periods of time.
In MPEG-TS the Program Clock Reference (PCR) is used for source clock recovery. A receiver can measure the drift between the PCR and the local clock and act accordingly, either adjusting the local clock to match the source, or drop and duplicate frames (usually slowly, one or two frames dropped or duplicated every hour) to match the local clock of a facility. This is difficult to do well (a story for another blog post). Consumer devices have very basic clock functionality, their clocks are not adjustable and there is no simple way to measure their output clocks (and audio and video may even have a different clock!), making the process of clock recovery difficult and imprecise.
This is often handwaved away by saying “Use NTP” (take a drink) but in the real world cameras or graphics cards are not synced to NTP (there is no such thing as an NTP camera, they have their own onboard oscillators, even your PC webcam). A clock isn’t necessarily stable: a camera in the desert has a clock that might drift differently during the day in the heat vs at night in the relative cold. Similarly, people claim everything is locked to GPS but fail to realise a lot of broadcast gear sits underneath a football stadium with no view of GPS satellites.
Historically, the use of the PCR had a secondary benefit, early receivers used it to generate the analogue clock. Incorrectly generating the analogue clock could lead to TVs outputting Black and White as the colour signal would not be generated correctly. In some countries, this might even still be necessary.
There is no other format apart from MPEG-TS that has an inbuilt clock recovery scheme with the precision required for long-term B2B contribution.
Common Clock for Audio and Video
There has been a trend in recent years to split audio and video into separate RTP flows in order to process them independently. This is done in WebRTC and ST-2110. However, the timestamp in RTP advances at different clock rates and is only 32-bit. Video is generally at 90KHz and audio at 48KHz which is problematic as they wrap around (go back to zero like your car’s odometer) at different times. So video wraps first, then audio wraps later, and this can happen several times over. So a decoder that joins a day into the stream is unable to synchronise audio and video except by putting its (proverbial) finger-in-the-air.
ST-2110 explains how to solve this problem. It bases the timestamps on the PTP Epoch (1st Jan 1970) and so a receiver can calculate how many RTP timestamp wraps would have occurred up to the current time and synchronise audio and video correctly. WebRTC does not address this issue at all, it uses the classic handwave of “NTP” (take another drink). Most (all?) WebRTC implementations just guess lipsync based on packet arrival time, which isn’t acceptable in a professional environment.
MPEG-TS doesn’t have this problem, the clock is common and audio and video wrap at the same time, making sync easy.
One of the special things about B2B contribution video is that latency needs to be clearly defined, in particular the amount of time a decoder needs to wait before it can start playing. This is important because it allows for certain frames to be larger, which improves compression, just as long as the worst case is known. This is known as the VBV/HRD and whilst there isn’t time to go into this in detail (you could write a book on it) there is a good video explaining it here: https://www.youtube.com/watch?v=-Q7BuSXdO_8
Unlike web streams which can burst above their maximum bitrate momentarily, an MPEG-TS has to output data at a maximum rate meaning there is a wait time before a decoder can start decoding. MPEG-TS is the only container that clearly defines the wait time for a decoder, allowing a user to optimise the trade-off between latency and compression efficiency depending on the use-case. This latency (vbv-delay) is signalled clearly by the PTS (technically the DTS, but let’s leave reordering to one side) minus PCR value; this calculation tells the decoder to wait a given amount of time before presenting a frame. Both these values are needed, a PTS alone is not enough information (a PTS relative to what?).
A defined latency is important so that if a decoder was decoding several camera angles in remote production, they will all be in sync. If latency was undefined, then at the point of a camera angle change, a viewer would potentially see a goal again, or a car overtake again. Undefined latency is prevalent with consumer protocols such as RTMP and HLS, but most of the time the viewer has nothing to compare it to, except when their neighbours cheer before them as they are watching on TV with a lower delay!
WebRTC has no method of signalling vbv-delay and so in practice is hardcoded to one frame, drastically reducing quality but this might be tolerable for videoconferencing. It is therefore important to understand the use-case difference between encoding to MPEG-TS for B2B contribution and encoding to WebRTC for a videoconference.
Many private networks use multicast to transport signals so that streams can be received by multiple listeners at the same time. There is practically no support for multicast outside of MPEG-TS and ST-2110.
It may be legally required in some jurisdictions to transport legacy data such as Teletext Subtitles or DVB Subtitles and so the container needs a method to transport this data. In addition, WebRTC doesn’t support interlaced video, which is still widespread in broadcasting.
It is only by using MPEG-TS that all of these requirements can be met and high quality video transport can be done with a defined latency.