W:Multiformat Media Delivery
From Bjoern Hassler
IMPORTANT: This page assumes that Flash 9 has come out of beta, i.e. that H.264/AAC support is in the current non-beta releash of the Flash player, see 'note 3' below. See Flash H.264.
News: Basic support for Ogg in Firefox and Opera would warrant inclusion of Ogg formats below, see HTML 5 video tag and Ogg.
The page on Encoding Timebased Media makes a case for H.264/AAC and mp3 as standard formats. It suggests that, if you encode to a single format, you use H.264/AAC and deliver this both for online viewing (Flash 9), for download and for podcasting. It suggests that if you encode to a 2nd format, you should add audio-only mp3 at 32kbps.
This choice will be suitable for a considerable audience, but not your entire potential global audience. Also, you're not reaching that audience with the highest possible quality, and not with the greatest accessibility. In particular, it will not work for viewers on low bandwidth connections (modems, certain geographic areas, handheld devices, ...) So how do you up the quality, and how to you get additional people watching your media?
This page outlines multiformat media delivery options (for time-based media, i.e. linear video and audio). The thinking behind this was developed at least over the last few years (since around 2004), and has of course changed considerably over this time. In 2004, there was a true plurality of formats, and it wasn't clear (in my view) which formats would dominate in the future. Three years on (2007), this plurality of formats is still the case, though to a lesser extent, see Encoding Timebased Media. However, you'll still gain accessibility from providing a range of formats, as well as new audiences. So what should these formats be?
2 The media vs. channel paradigm
See some of my talks available on Podcasts.
3 What formats?
We're assuming that you want to encode into more formats, rather than fewer formats, but that you also want to keep a lid on your CPU time to get a good throughput. I am putting forward that you can cover most bases with nine video encodes and four audio encodes:
- MPEG4/Flash 9 as m4v with H.264/AAC, at four bandwidths (high/main/low/access), the same format used for streaming and download (4 formats, see comments about QTSS below)
- MPEG4 as 3gp with MPEG4P2/AAC at one bandwidth ('access', 1 format)
- WMV (single rate, one format)
- RV (single rate, one format)
- Flash 7 (Sorenson) for compatibility with older Flash installations (main/low, two formats)
- AMR-NB audio, and mp3 (two audio formats).
- WMA and RA (two audio formats).
(Note: In this document, we use 'MPEG4P2' to refer to the MPEG4 Part 2 video codec, as opposed to the H.264 video codec, MPEG 4 Part 10.) Not all of these nine plus four formats are considered essential, see below.
For audio only materials, you'd use four encodes: high bitrate mp3 or AAC, as well as the same low bandwidth mp3 and amr settings.
You may of course argue with some of these settings, particularly perhaps with the absence of very high bandwidth high-definition formats, or with high quality audio settings within those formats, or stand-alone audio formats. If you specialise in high-definition content, or in high quality music content, then of course you may need to add a few more settings in. However, the above list is geared to cover a reasonable set of formats, that would cover most cases.
We are assuming that you wish to distribute your materials as widely as possible, so we pay not attention to DRM or authentication. If you needed authentication, RealVideo would become quite attractive, as it offers comprehensive authentication and protection mechanisms.
In the diagram, each box represents a video format (i.e. a separate encode). For most formats, you'd use the same formats for streaming and download. There are 9 yellow boxes, and four boxes for audio, i.e. 9 formats for video, and four formats for audio. In the audio section, there are six boxes for audio, i.e. six audio formats. Boxes with thinner outlines represent less essential formats, i.e. for video we've got six essential formats, and three less important formats.
4 Update: August 2009
Media formats always change. And since this page was originally written, a lot of things have changed. What would we recommend now?
- The basic 'backbone' is unchanged, so we retain:
- (unchanged) MPEG4/Flash 9 as m4v with H.264/AAC, at four bandwidths (high/main/low/access), the same format used for streaming and download (4 formats, see comments about QTSS below)
- (unchanged) MPEG4 as 3gp with MPEG4P2/AAC at one bandwidth ('access', 1 format)
- We note that H.264 is now one of the dominant formats, and so legacy formats can be scaled back.
- (reduce to one low bitrate format for older players) Flash 7 (Sorenson) for compatibility with older Flash installations (main/low, two formats)
- We still have have computers that only have WMV, so we keep:
- (keep) WMV (single rate, one format, e.g. WMV7 suffices, but see Silverlight 3 comment below)
- Real has substantially lost market share:
- (drop) RV (single rate, one format)
- Firefox 3.5/Opera how have ogg compatibility
- (add) Ogg Theora
In terms of audio formats:
- (keep) AMR-NB audio, and mp3 (two audio formats).
- (keep) WMA (but see Silverlight 3 comment below) and
- (drop) RA (two audio formats).
- (add) Ogg vorbis
Remember that this list is aimed at to provide a comprehensive list of formats, not a minimal list of formats. See Encoding_Timebased_Media if you want to encode in as few formats as possible.
Here are some notes, corresponding to the blue circles in the diagram:
5.1 Note 1: Bandwidths and resolutions for video
Bandwidths, particularly at the higher end, are somewhat arbitrary. We have chosen 1000/500/128/64 and call these high/main/low/access. The corresponding video resolutions would be full resolution (Standard Definition) for 'high' and half resolution (CIF) for main/low, and quarter (QCIF) for access/modem/mobile. For each video, two audio only versions are also produced.
- High, 1000kbps, full resolution (standard def, 4:3 at 640x480 or 768x576, or 16:9 equivalents)
- Main, 512kbps, half resolution (close to 'CIF', wrt. standard def, 4:3 at 320x240 or 384x288 or 16:9 equivalents)
- Low, 128kbps, half resolution, encoded with H.264
- Access, 64kbps, quarter resolution, encoded with the standard mp4 video (mp4-part2), (QCIF, 172x144, suitable for mobiles. Minor crop of 4:3 to QCIF, more radical crop of 16:9 to QCIF also). Potentially add an H.264/AAC version too, at 48kbps for streaming to modem.
- Audio only, 32kbps mp3
- Audio only, 8kbps AMR-NB
'High' at full resolution is important, so that full size scientific content can be watched. Access is important for access. You might think that the least important format of these is probably the 'low' setting. However, it makes sense to provide both and mp4-part2-based format as well as an H.264 format at low bandwidth.
There is no 'high definition' in this list. We mostly shoot in HDV on Z1 or equivalent, and the footage looks best when downgraded to SD 16:9. We rarely actually output in HDV. For web, you'd loose the interlacing in 1080i anyway, and the 1920 horizontal pixels are interpolated from 1440. So a full frame output close to standard definition seems to be the best option. However, if you want to have high definition, and your production quality is XDCAM, etc, you could add a higher bitrate, in a high definition format.
A common objection is this: "My footage is so well shot and lovely, I don't want people to see this at 'access' video rate." I'd say: Let the viewer choose how to access your materials. If they find the 'access' version worthwhile, then perhaps they'll wait until a better version has downloaded.
The same goes for audio-only versions: "My video relies extensively on visual materials, these need to be seen, and so the audio only version is no good." Same response as to the last point. For a lecture, it's of course worth to publish a pdf alongside the mp3, so that viewers can look at the pdf while listening to the mp3.
We haven't run for a long time with AMR-NB yet. Objectively, there's the need for this, but we don't know whether that need has translated into a demand yet, and whether AMR-NB meets that demand in terms of usability. In any case, including AMR-NB is a statement that you care about low bandwidth access, and it won't take much server space or CPU time to encode. If you generally have good audio quality, you could choose AMR-NB near the lower end of bitrates (6-8), otherwise a slightly higher bitrate (say 12). Just in case you don't think there's an access issue in terms of low bandwidth, see Web Design 4 Low Bandwidth.
5.2 Note 1 ctd: Bandwidths for audio
For audio, you'd use
- 'access': amr-nb, say at 5.75kbps
- 'baseline': mp3, mono, 32kbps
- 'main', e.g. m4a/AAC, mono, 64kbps
- 'high', 128kbps, (joint) stereo, 128kbps
- perhaps wma, and perhaps ra, see discussion in the WMV/RV section below
You want all your audio to be normalised, and some encoders can do this as part of the encoding process. AMR-NB and mp3 at 32kbps were discussed in the previous section, and the same comments hold. mp3 at 32kbps is very listenable if your audio has been recorded well. You might want to offer a version at 64kbps, which would give you very good quality for mono recordings. If you want to vary the format, you could make this m4a/AAC. You probably want to offer a higher quality audio version as well, for music.
If you have a lot of music to encode, you might want to consider have some presets that are geared for this, perhaps including an even higher bitrate, or m4a/AAC at 128kbps as well. With video formats you might want to reserve more of the bitrate for the audio.
Of course audio will be much faster to encode than video, so audio formats are easier to do.
5.3 Note 2: The H.264/AAC backbone
The formats. The backbone of the set of formats is H.264/AAC, used for downloading, as well as progressive and/or seekable-progressive into a flash player (say FlowPlayer), and potentially streaming from QTSS.
You might not agree that H.264/AAC is the best format to go for, and you might suggest WMV or RealVideo instead. I'd disagree, but this is quite a long discussion. In brief, my view is that WMV and RealVideo are being squeezed out inbetween Flash for online delivery, and podcasting/downloading on the other hand. RealVideo has got some advantages, in terms of failover, security etc, but we're not considering these as essential features in our scenario.
We encode to m4v or mp4, with H.264/AAC with four bitrates (high/main/low/access) given above, and to mp4 or 3gp with mp4-part2/AAC for 'access' only. If you can for all bitrates, but certainly for the lower bitrates, we want to squeeze out the best image quality, and use high-quality two-pass settings. These five formats will take care of downloading (and syndication via RSS): The files will play back in iTunes, QuickTime player, RealPlayer, as well as a range of other open source players like mplayer, VLC, etc. Need to check Windows Media Player.
You might think that the least important format of these is probably the 'low' setting, or perhaps the duplicated 'access' setting. However, it makes sense to provide both and mp4-part2-based format as well as an H.264 format at low bandwidth. The H264 'access' format will work for web-based modem delivery, while the MPEG4P2 'access' format will work for older mobiles (but not play in flash). However, the 'low' setting still has a low-ish bandwidth, and will have much better image quality.
Delivery. As default delivery, the files are offered for download (obviously), and ideally downloads are syndicated, see section on podcasting below. However, the files (high-access) should also be delivered with http into a flash video player (say FlowPlayer).
As flash is the main format, this should be an http-seekable delivery. It's unclear whether http-based seeking for Flash 9 video in H264/AAC is already working reliably. Ideally you'd http-seekable delivery of m4v files into the flash player. Flash/H264 is the main format, so if this turns out to not work, then one would have to rethink some of the above.
QuickTime Streaming Server. The files won't stream from QuickTime streaming server (QTSS). Would you want to stream from QTSS? QTSS is quite responsive, and works quite well for 'browsing' streamed files by dragging the playhead around (without buffering or letting go of the playhead). So for scientific content this is quite useful. So suppose you wanted to use QTSS.
- The standard solution would be to encode four times, and then create another four hinted files. So you encode four times, but end up with eight files. Note that if you choose optimised hints, your hinted files will be twice the size of the unhinted files. Hence you've tripled your disk space requirements. (Original file, plus hinted file at twice the size.) Best way forward probably to use 'unoptimised' hinting, or do the following:
- One possible way forward is to add an 'unoptimised' hinting layer, which rests on teh assumption that we can add hinting layer, that doesn't interfere with FlowPlayer/Flash or iPod playback or iphone playback. If 'unoptimised' hints are chosen, the filesize doesn't seem to increase too much (~10%), which might be ok for the higher bandwidth settings. However, for the lower quality formats, where filesize and bitrate is crucial, we'd have to create separate versions. In this scenario, you'd encode four times, and then hint the two highest bandwidth formats 'in place', and also create hinted versions of the lower bandwidth formats, ending up with six files from four encodes.
5.4 Note 3: The Flash 9
beta issue Update 3
The last note was written from a Flash 9 Update 3
beta point of view. However, at the moment, Flash with H264 is still in beta. This affects the present discussion is as much the H.264 'backbone' is not With the release of Flash 9 Update 3 the H.264 'backbone' is available for online viewing, as well as for downloading or quicktime viewing. We should still be encoding those formats now, for when flash 9 comes out of beta, but in the meantime need some extra formats. We thus encode into those formats, but need some extra formats until Flash 9 Update 3 has been adopted widely.
In the meantime, we could thus
- either also do On2 VP6, and drop this in favour of mp4 at some point after the Flash 9 player
is out of betahas been widely adopted, or
- we just go with Flash 7 / Sorenson, as outlined above, and demote it to legacy format in favour of m4v as soon as the Flash 9 player
out of betahas been widely adopted.
The second option is cheaper, and doesn't require additional encodes. So we'd encode all the m4v formats now, and when Flash 9 is out of beta, those become the main formats. Initially many people would still be on older flash versions, so we would check for flash 9 automatically. If the user has it, they get the higher quality files, otherwise they get the legacy format, and a note saying 'if you upgrade to flash 9, you'll get much better quality".
4th Dec 2007: Now that Flash 9 update 3 has come out, we just need to wait for Flash 9 Update 3 to become widespread.
5.5 Note 4: Flash 6 anybody?
If we do Flash 9, because of the H.264 advantage, then what about older versions of flash? It's probably worth to not abandon older Flash versions completely. For one thing it will take a little while before most people have Flash 9.
So which older flash version should we support? Answer: The oldest version that supports movies properly.
- Flash 6 didn't do progressive download, but only offered a limited number of frames for video within an swf (about 10 mins, the exact frame rate for this escapes me). Your swf will only play when fully downloaded. This isn't a good idea. If you really wanted, you could provide 1 minute previews in Flash 6. In any case, you won't be able to show videos of arbitrary length in Flash 6. Flash 6 wasn't a video format, and isn't worth fiddling with.
- Flash 7: Supports progressive download as flv into an swf (e.g. FlowPlayer), with Sorenson codec for video.
- Flash 8: Better video quality for flvs through ON2 VP6 codec. Usually encoders for On2 VP6 cost extra money (particularly in automated workflows).
Flash player 7 is the sweet spot: It supports video properly, is a comparatively early flash format, with free encoders around. The video quality is worse than Flash 8, but for added compatibility that's a good bargain, and we'll have top video quality in Flash 9 anyway (see above).
5.6 Note 5: WMV and RV
So what about WMV and RV? RealVideo of course was a pioneer of audio and video delivery over the net. However, as a proprietary format, with considerable cost of associated tools, it has lost out to other formats. Windows Media Video has got a strong base through the Windows operating systems, but as a format also seems less relevant now that H.264 covers both online viewing through flash as well as downloads.
About RealVideo: If you support RealVideo, do a single rate stream, delivered (1) off the Helix server if you have one, otherwise off a webserver, and (2) as a download. Here's why:
- Encoding time. One could go for multirate RV for streaming, and single rate RV for download. However, because RV loosing in popularity, and the streaming server is expensive, multi-rate delivery of RealVideo doesn't seem worth it.
- Cost. The Helix server is expensive (though less so for education). Multi-rate files will not be usable once you give up the expensive streaming server in favour of a web server.
- Similar quality of H.264 and RealVideo at low bitrate. The low-bitrate quality of H.264 and RealVideo seems similar. Server-based Multirate RealVideo may handle congestion better than H.264 streaming, but in many ways getting around congestion is to offer a download.
- RealPlayer does seekable progressive on RealVideo. If you do a single rate file for progressive download off a webserver, it will will be seekable. You can offer the same file for download. If you were going to support RealVideo, that seems the best option.
- RealPlayer plays H264 progressively (not seekable), but you can stream from QTSS into RealPlayer. So you can support RealPlayer without having to support RealVideo.
Windows Media: Because of the Windows operating system, windows media player is widespread. So there's a case for providing Windows Media. I would suggest that you don't want to do multirate server-based delivery, but do single rate delivery, delivered (1) off a windows media server if you have one, or the Helix server if you have one, otherwise off a webserver, and (2) as a download. By the same token, you might want to make a windows media audio file available.
For windows media, as for RV, you don't want to use multi-rate: It uses too much CPU, and multi-rate files will not be usable once you move away from the streaming server. Seekable progressive downloading doesn't work for windows media files, so if you have long files, you might need to
- Run windows media server in a windows server environment
- Get a helix universal server to serve windows media in a linux environment
- Get progressive-seekable flash working (see above), and steer your viewers to flash.
In summary: For both wmv/rv: Single rate files delivered from a streaming server if server available, otherwise progressive download from web server. Streaming server issues for wmv formats (which may get some traffic) may still be desirable. However, if flash seeking works, and we steer people to flash, then few people prob used wmv.
About bandwidths: If you are going to use WMV and/or RV, you might want to vary the bandwidths used for WMV and RV compared to the flash/m4v versions. WMV you'll probably want a similar bandwidth to 'main', perhaps slightly higher. RV you might want at a lower bandwidth, say at 300kbps, inbetween 'main' and 'access'.
By the way: For live streaming, you need a streaming server. QTSS into QuickTime player and into RealPlayer seems like a good option. Otherwise Windows / Real via Helix Universal Server, or Windows via Windows Media Server. Simultaneous multi-format encoding for live streaming requires specialised software, see Live Streaming.
"I am unconvinced by this, and we'll stick with WMV/RV.". Two years ago (2005) various proposals were discussed, and some views were put forward that we should just do mp4, and that this would be sufficient. At that time, I argued strongly that WMV/RV needed to be included: there was little support for mp4 in mainstream players. Of course you could always have downloaded an open source players that would play the format, but experience showed that vast majority of our potential media viewers would not go to the length of installing extra players just to watch out content, either because they could not be bothered, or because they were unable to do so. It was thus imperative to provide formats that were suitable for most people. Also, H.264 wasn't available, so the image quality of MPEG4P2 (in terms of quality for bandwidth) wasn't particularly good, especially at the low bandwidth end.
Over the last two years, the balance has shifted significantly from WMV/RV, first towards flash video, and then towards mpeg4/H.264. Firstly H.264 is a strong codec, at the very least rivalling WMV/RV. Also, player support is much more widely available. However, it is really the prospect of Flash/H.264 that swings the balance, and turns H.264 into a key format for both online viewing, as well as downloading/podcasting. Still not convinced? Time will tell.
5.7 Note 6: Ogg Vorbis and Theora
We may need to add Ogg Vorbis and Theora to the list, see HTML 5 video tag and Ogg.
5.8 Total bitrate
Finally, a note on space requirements, which depends on total bitrate. The total bitrate is 3.5Mbps = 1.5GBph. If all QT versions are duplicated to give separate hinted versions, then about 5Mbps = 2.2GBph. So your server disk space requirements will be between 1.5GB and 2.5GB per hour of material.
Note also that you should keep your source materials in the highest possible quality.
6 Additional Notes
6.1 How many settings?
One question is how many settings you'll need. Of course you'll need one setting per format above, but the number of settings needed also depends on the type of input formats, and to some extend on the automated adjustments your encoder may make.
Generally speaking, you might have to cope with the following input formats:
- interlaced input
- 4:3 interlaced, as PAL or NTSC
- 16:9 interlaced, as PAL-anamorphic or NTSC-anamorphic, or HDV 1080i50 or HDV 1080i60
- progressive input
- E.g. HDV 720p or HDV 1080p
- other progressive footage (e.g. screen recordings)
Use different settings for PAL and NTSC. Only if say 99% of your footage is PAL, you might want to apply the same settings to NTSC footage to save on the extra set of settings. Otherwise (to get highest quality), you might want to have separate settings for PAL and NTSC.
4:3 vs. 16:9. You will need separate settings for 16:9 and 4:3. At best, a 4:3 setting applied to 16:9 input will result in letter-boxed or cropped video.
HDV. Downsizing HDV footage to standard definition and just using the standard def settings is a possible option. You might want to have separate settings for HDV, to leverage the full quality of the format.
Interlaced vs. progressive input. You will (in most circumstances) need separate settings for interlaced or progressive intput. Progressive input should not be interlaced, while interlaced input needs to be deinterlaced. (A circumstance where you could use the same settings is where interlaced footage can be processed without deinterlacing, for instance where you are taking PAL footage to half the number of lines ("poor man's deinterlace"). In that case, you may not need to apply a deinterlacing filter, as the encoder just drops every other field. The same setting would then work for progressive input. However, best to keep settings for interlaced and progressive input logically separate. More info on Interlacing.)
The groups. For each type of input format, we'll need a group of settings to generate the required formats. To get the most out of your input, you might thus need the following groups of settings:
- Two groups of settings for 4:3 interlaced, namely for 576i50 (PAL), 480i60 (NTSC)
- Four groups of settings for 16:9 interlaced, namely for 576i50 (PAL anamorphic), 480i60 (NTSC anamorphic), 1080i50 (HDV), 1080i60 (HDV)
- At least two groups of settings for progressive high definition, namely 720p and 1080p
- The framerates used are 720 with 60p, 30p, 50p, 25p, 24p, and 1080 with 60i, 50i, 25p, 30p, 24p.
- Three groups of settings for progressive standard definition, namely 480p24, 480p30, 576p25. (If your settings have fractional frame rate adjustments, you can cover 480p24 by the same settings 480p30.)
- (In theory, but much less likely in practice, you might need groups of settings to cover 480p60 and 576p50.)
To give a practical angle on this: The vast majority of footage we encounter is 576i50 (4:3), 576i50 (16:9) and 1080i50. But this is very much from the perspective of a single HE institution in the UK in 2007. Elsewhere, or at other times, you'd encounter other input formats you'd need to cater for.
6.2 Workflow scheduling
If you have move material coming in than your setup can deal with, it is advisable to encode 'formats in turn', rather than 'sources in turn'. I.e. you don't go source by source, but proceed format by format. First you generate mp3 for everything. If new sources come in, older sources wait, until everything has mp3. Then you do the 'main' format for H.264, then the others, least important formats last. This way the most important formats become available most quickly. As long as you don't run out of disk space, and you manage to process all sources within a few days, this should be full acceptable.
6.3 Corollary: Archiving of source materials
Clearly the formats described above make sense at this moment in time only (Nov 2007). The equivalent list similar list two years ago looked different, and the equivalent list in two years time will look different also. Future formats will have greater emphasis on HDV, and probably on Flash.
So how does one cope with future changes in formats? By archiving the source materials, ideally in source formats. Disk space is so cheap now, that there's hardly any reason to not keep your input materials.
If you keep your input materials, you will be able to re-encode these into more current formats.
You probably want to keep the source format, as this is highest quality. You might want to consider also archving a format that is fully open source (eg. mpeg2 iframe), but you probably don't want to do this instead of the source format, but as well as the source format. There are also problems with this strategy around HDV, as HDV (unlike DV) is not an iframe only format, and your disk space requirements would increase disproportionately.
Further information on Multimedia Encoding Workflow
As well as offering everything for download, you should syndicate your materials into podcasts. This is widely used, but also helps with low bandwidth accessibility. You should generate some the following feeds:
- One feed for video high (for AppleTV) where video is available, audio high where only audio is available
- Podcast for iPods: Video main for video assets, with audio main where only audio is available.
- Podcast for mp3 players. Audio baseline: audio baseline (mp3 at 32kbps) for both video and audio assets.
- Mobile phones 3gp/mpeg4, AMR for audio
- Mobile phones 3gp/H264, AMR for audio
The most important ones of these are the iPod feed, and mp3 feed, and a mobile feed.
With the H.264/AAC compatibility of Silverlight, we close an important circle. It's now possible to playback the same media files across the technologies of all major vendors: Apple, Adobe Flash, Real, and finally Microsoft Silverlight:
Silverlight 3 was first announced at the IBC 2008 show in Amsterdam on September 12, 2008. It was unveiled at MIX09 in Las Vegas on March 18, 2009. A beta version was made available for download the same day. The final version was released July 9, 2009. 
Once Silverlight 3 gains currency, it will be possible to drop wmv support.
7 Related Pages
Encoding a large range of formats from each piece of source media works best when it's embedded in an overall workflow. Clearly you couldn't possibly expect individuals at your institution to encode many different formats, no should they have to. Implementation is now very feasible, due to products like Apple Podcast Producer, Episode Podcast, and open source efforts like Berkeley's OpenCast. More information on this on this page: Multimedia Encoding Workflow.
9 Please comment
I welcome comments on the above article - my email address is below!