Skip to main content

Using Netflix machine learning to analyze Twitch stream picture quality

Published August 10, 2018. Updated August 11, 2018.

Ever wonder what exactly makes some streams look great, and others not-so-great? Read on...

What is this?#

This report contains an analysis of the picture quality of x264 and NVENC H.264 encoded video game footage, across different common resolutions, bitrates, and encoder settings. Analysis was performed using Netflix VMAF, a machine learning algorithm trained by Netflix to detect perceivable quality degradation in video footage, when compared to a high quality source video.

Methodology#

Source footage#

One minute of Heroes of the Storm gameplay was captured on a Windows 10 version 1803 machine running an Intel Core i7-8700K 6-core CPU and an nVidia GeForce GTX 980 Ti GPU. Source footage was recorded using OBS Studio at 1920x1080 @ 60fps, with the NVENC H.264 encoder on the Lossless rate control. This resulted in a 1.5GB source file with video content encoded at a variable bitrate, averaging roughly 200Mbps - far beyond what any Twitch broadcast would ever transmit.

It's also worth mentioning, the GPU was running BIOS version 84.00.32.00.01 and Driver version 24.21.13.9882 (398.82). No overclocking or modification of any kind was used.

$ ffmpeg -i /input/1080p60_source.flv
Duration: 00:01:00.97, start: 0.000000, bitrate: 201313 kb/s

For source footage and all re-encodes (both x264 and NVENC), the High profile was used to enable the full capabilities of the encoder. Older, more limited profiles such as Baseline and Main aren't necessary any more, unless your audience is watching on devices from 2010 or before (iPhone 4, original iPad).

Re-encoding#

The source footage was then run through a series of re-encodes using ffmpeg at various resolutions, bitrates, and x264 presets. Encoding was performed on an AMD Ryzen 7 2700X 8-core CPU running Ubuntu 18.04.

If you're not familiar with x264 presets, this quote from the ffmpeg wiki sums it up nicely:

"A preset is a collection of options that will provide a certain encoding speed to compression ratio. A slower preset will provide better compression (compression is quality per filesize). This means that, for example, if you target a certain file size or constant bit rate, you will achieve better quality with a slower preset."

The resolutions and bitrates tested were selected based on the minimum and maximum Twitch recommended bitrates, plus one step beyond and one step below Twitch's recommendations. A re-encode for each x264 preset from slow to ultrafast was created at each of the bitrates listed below, resulting in a total of 76 different encodings to compare, and additional encodings for NVENC H.264.

1080p60720p60Twitch Documentation
7500K6500KAbove Recommended
6000K5000KMaximum
4500K3500KMinimum
3000K2000KBelow Recommended

Re-encodes were performed using the source footage as input, with constant bitrates and a keyframe interval of 2 seconds, as recommended by Twitch:

$ ffmpeg -i /input/1080p60_source.flv -c:v libx264 -profile:v high -preset ${preset} \
-b:v ${bitrate} -bufsize ${bitrate} -r 60 -g 120 -keyint_min 60 \
-x264opts "nal-hrd=cbr:force-cfr=1:no-scenecut" -sws_flags lanczos \
-pix_fmt yuv420p -f flv -strict normal -an -threads 0 \
-benchmark /output/1080p60_${bitrate}_${preset}.flv

Analyzing picture quality#

To analyze picture quality, we have to first extract the raw video data from the source, and from each of the 76 re-encodes:

$ ffmpeg -i /input/1080p60_${bitrate}_${preset}.flv \
-c:v rawvideo -pix_fmt yuv420p /output/1080p60_${bitrate}_${preset}.yuv

Then, we use Netflix VMAF to compare the re-encoded video against the source footage. The analysis compares each frame of the re-encoded video against the matching frame from the source footage, and scores that frame with a value between 0 and 100:

$ vmaf run_vmaf yuv420p 1920 1080 /input/1080p60_source.yuv \
/output/1080p60_${bitrate}_${preset}.yuv --pool perc5 --out-fmt json

The perc5 pool indicates that we'd like a final, single score for the whole file to be displayed at the end of the analysis, and we want that final score to be calculated as the 5th percentile of all the frames analyzed - thus, 5% of the frames analyzed will have a score lower than the final "perc5" score.

Here's an example of VMAF output for 1080p60, at a bitrate of 7500K, using the slow preset, and receiving a perc5 score of 88.3:

"aggregate": {
"VMAF_feature_adm2_score": 0.96897966972166083,
"VMAF_feature_motion2_score": 0.0,
"VMAF_feature_vif_scale0_score": 0.53733515947126853,
"VMAF_feature_vif_scale1_score": 0.87603492999573351,
"VMAF_feature_vif_scale2_score": 0.93075291436047602,
"VMAF_feature_vif_scale3_score": 0.95639934702793172,
"VMAF_score": 88.292012745836317,
"method": "perc5"
}

For comparison, here's VMAF output for 1080p60, 3000K, ultrafast, receiving a perc5 score of 51.1:

"aggregate": {
"VMAF_feature_adm2_score": 0.89763512043077032,
"VMAF_feature_motion2_score": 0.0,
"VMAF_feature_vif_scale0_score": 0.21322585605949121,
"VMAF_feature_vif_scale1_score": 0.51169636349740821,
"VMAF_feature_vif_scale2_score": 0.61514236173436809,
"VMAF_feature_vif_scale3_score": 0.69211402424418644,
"VMAF_score": 51.135980335947274,
"method": "perc5"
}

Interpreting score#

In-depth analysis of cinematic footage has shown that, roughly speaking, a VMAF score of 93+ is essentially flawless cinematic footage, and a score of 20 is completely unwatchable.

However, to my knowledge, no proper scientific study has been performed to the train VMAF specifically on video game footage. Video game footage, in addition to having more fine detail than cinematic footage (smaller text and graphics), is also typically viewed much closer to the screen than is cinematic footage.

For comparison, here is a sample source frame at full quality, and that same frame from several of the re-encodes at various bitrates and presets:

Disclaimer: Keep in mind that this frame was selected specifically because it was one of the more difficult frames for the encoder, due to a fast camera pan. The image quality in the screenshots below do not represent the picture quality for the entire encode, just the quality of one particularly difficult frame.

__1080p60 source__

frame3370_1080p60_source

1080p60 7500K slow - VMAF frame score: 81.91080p60 6000K fast - VMAF frame score: 76.9
frame3370_1080p60_7500K_slowframe3370_1080p60_6000K_fast
1080p60 4500K veryfast - VMAF frame score: 57.61080p60 3000K ultrafast - VMAF frame score: 55.0
frame3370_1080p60_4500K_veryfastframe3370_1080p60_3000K_ultrafast

After subjectively analyzing these frames, and other select frames at various qualities with various VMAF scores, my personal scale when observing VMAF scores for video game footage is as follows:

  • 95+ is essentially flawless
  • 85-95 is excellent
  • 75-85 is good
  • 65-75 is fair
  • 55-65 is poor
  • below 55 is unwatchable

Results#

1080p60 x264 Picture Quality#

With the scale above in mind, here are the VMAF quality results for the 28 different 1080p60 re-encodes:

1080p60_quality

1080p60slowmediumfastfasterveryfastsuperfastultrafast
7500Kexcellentexcellentexcellentexcellentgoodgoodfair
6000Kexcellentexcellentgoodgoodgoodfairfair
4500Kgoodgoodgoodgoodfairfairpoor
3000Kfairfairfairfairpoorunwatchableunwatchable

Above we can see that picture quality is largely unchanged when using the faster preset, and more difficult ones, but drops off quickly once you use veryfast or easier presets. At 6000K, picture quality dips below the "excellent" threshold for all presets except slow and medium. This suggests that even when using Twitch's maximum recommended bitrates, it is quite difficult to get high picture quality.

In fact, many of the top Twitch streamers that stream at 1080p60 do so with an expensive multi-PC setup capable of using the slow or medium presets, and they do so at bitrates far above Twitch's recommended maximum, in the 7500-8000K range. These results demonstrate why such setups are necessary to maintain excellent picture quality at 1080p60.

Many smaller Twitch streamers that are broadcasting in 1080p60 resort to encoding using the veryfast preset, as more difficult presets are simply too demanding for typical hardware, especially single-PC setups. Unfortunately, the veryfast preset using Twitch's maximum recommended bitrate of 6000K results in a VMAF score of just 78.9, firmly in the "good" range.

Streamers that don't have the bandwidth to encode at 6000K suffer greatly, as the 4500K encode using the veryfast preset scores "fair" at 72.7. Dropping below Twitch's recommended minimum bandwidth down to 3000K, the veryfast preset scores "poor" at 61.8.

1080p60 NVENC H.264 Picture Quality#

NVENC H.264 has become a popular alternative to x264, since NVENC utilizes hardware accellerated encoding that greatly reduces the impact on the CPU. However, NVENC has also been criticized as generating poor quality output as compared to x264 encodes at the same bitrate.

Different generations of nVidia GPUs, as well as different SDK versions & drivers support different features in NVENC. An overview of the differences is summarized in the Video Encode and Decode GPU Support Matrix, as well as in the NVENC Application Note and the the NVENC Programming Guide. Most of the differences are in regards to encoding speed, 4K H.264 support, and H.265/HEVC support (so not currently relevant to Twitch streaming), but there are also some variations in H.264 optimizations across the generations of GPUs and SDKs. Just remember that it's not necissarily apples-to-apples when comparing across GPU generations, models, and SDK versions. As a reminder, these results were obtained from a GTX 980 Ti with 398.82 drivers.

One fantastic thing about NVENC is its encoding speed. As long as you are using a Maxwell GPU (GTX 750) or newer, NVENC is capable of encoding 1080p at 180+fps, even with 2-pass encoding enabled. Older, Kepler cards may struggle with 2-pass enabled, but for the majority of streamers, enabling 2-pass should not be a problem. As such, all of the analysis below used CBR and 2-pass encoding.

Also important to note is NVENC uses a different set of presets than x264:

  • Default
  • High Quality
  • High Performance
  • Bluray
  • Low Latency High Quality
  • Low Latency High Performance

Default supports CABAC, and should provide a good quality baseline.

High Quality uses CABAC, and also adds support for B-Frames, which should increase quality beyond Default.

High Performance doesn't utilize CABAC or B-Frames. Instead, it uses an easier CAVLC algorithm which, given a fixed bitrate, should lower quality relative to CABAC presets.

Bluray is essentially High Quality, but with B-Frames capped at 3 for compatibility with bluray standards. All of the presets we tested (both x264 and NVENC) also used 3 B-Frames (where B-Frames were supported). Thus, Bluray was excluded from this test, as the results did not differ from High Quality in any meaningful way.

Low Latency HQ / HP presets use CABAC, but they don't use B-Frames. They also reduce difficulty in other ways such as reduced deblocking and smaller motion estimation search ranges. You should note that these presets were designed for applications that are hyper-sensitive to render times, such as GPU-streaming services (using a remote GPU over the internet) or video conferencing.

Here are the VMAF quality results for the 20 different 1080p60 NVENC H.264 re-encodes:

1080p60_nvenc_quality

NVENCDefaultHigh QualityLow Latency HQLow Latency HPHigh Performance
7500Kgoodgoodgoodgoodgood
6000Kgoodgoodgoodgoodgood
4500Kfairfairfairfairfair
3000Kpoorpoorpoorpoorpoor

NVENC's results are interesting, because the preset selected has only a slight effect on picture quality. By a very small margin, High Quality is the best quality preset, and by a fair margin, High Performance is the worst.

Notably missing from the NVENC results are any configurations that can generate an excellent picture. This is likely where the critics of NVENC make their argument, since, with enough CPU power, x264 will certainly generate a better picture.

Since it's likely that anyone who uses NVENC will be able to use the High Quality preset without issue (again, unless you're on an older Kepler GPU), it's perhaps more useful to compare just the High Quality NVENC preset to the x264 presets across a range of bitrates.

1080p60_x264_vs_nvenc

Compared to x264, NVENC HQ is approximately the same quality as x264 veryfast at the same bitrate. The breakeven is likely somewhere around 4000K, with higher bitrates slightly favoring NVENC, and lower bitrates slightly favoring x264 veryfast.

Another interesting difference is in which specific frames NVENC has difficulty with. When pulling the same frame that was examined previously in x264 (frame 3370), the NVENC samples were higher quality, across the board. This indicates that NVENC and x264 will degrade in slightly different ways, and at slightly different times.

If we check the VMAF output for a better example of a difficult frame for NVENC, we can see that, although it did relatively well with frame 3370, it had a much harder time with frame 944:

__1080p60 source__

frame944_1080p60_source

1080p60 7500K NVENC HQ - VMAF frame score: 77.81080p60 6000K NVENC HQ - VMAF frame score: 72.6
frame944_1080p60_7500K_nvenc_hqframe944_1080p60_6000K_nvenc_hq
1080p60 4500K NVENC HQ - VMAF frame score: 62.61080p60 3000K NVENC HQ - VMAF frame score: 50.9
frame944_1080p60_4500K_nvenc_hqframe944_1080p60_3000K_nvenc_hq

720p60 x264 Picture Quality#

For analyzing 720p60 picture quality, there was one additional step of re-scaling the 1280x720 output back up to 1920x1080 so a frame-by-frame analysis against the 1080p60 source footage could be performed:

$ ffmpeg -i /input/720p60_${bitrate}_${preset}.flv -vf scale=1920x1080:flags=lanczos \
-c:v libx264 -crf 0 /output/720p60_${bitrate}_${preset}_scaled.flv

After scaling output back up, here are the VMAF quality results for the 28 different 720p60 x264 re-encodes:

720p60_quality

720p60slowmediumfastfasterveryfastsuperfastultrafast
6500Kexcellentexcellentexcellentgoodgoodgoodfair
5000Kgoodgoodgoodgoodgoodfairfair
3500Kgoodgoodgoodgoodfairfairpoor
2000Kfairfairfairpoorpoorunwatchableunwatchable

Moving down to 720p60 brings a slight drop in perceivable quality across the board. One interesting difference is that Twitch's recommended maximum bitrate of 5000K at 720p60 is not capable of generating an excellent image, even with the most difficult presets.

Since the quality is lower across the board at 720p60, streamers with low bandwidth should consider dropping framerate to 30fps in order to maintain high image quality - especially since more difficult presets don't seem to make as much of a difference at 720p60 as they do at 1080p60.

CPU Difficulty#

In addition to measuring picture quality, the re-encodes were also benchmarked, and the utime recorded so an assessment of difficulty of each configuration could be recorded. Differences in platforms and configurations make it difficult to use this data as a way to estimate your own system's capabilities, but we can at least use this data to compare how difficult one resolution + preset combination is compared to others.

Bitrate, while it does have some effect on CPU difficulty, is less impactful than resolution, framerate, or preset. To simplify the comparison, only Twitch's recommended minimum and maximum bitrates are displayed below.

difficulty_comparison

CPU Difficultyslowmediumfastfasterveryfastsuperfastultrafast
1080p60 6000K637458364306213161126
1080p60 4500K588428343295207157125
720p60 5000K395280229194147118103
720p60 3500K361261217186143114101

The benchmark data allows us to make a few interesting observations. For example, 1080p60 4500K veryfast is about as difficult as 720p60 5000K faster. If we compare the quality scores for those same two configurations, the former had a total score of 72.7, and the latter had a total score of 81.4. So there are certainly cases where dropping resolution in order to use a more difficult preset, along with slightly raising bitrate would noticably increase quality.

If we look at the same difficult frame that we did previously, the quality difference is very apparent:

1080p60 4500K veryfast - VMAF frame score: 57.6720p60 5000K faster - VMAF frame score: 78.8
frame3370_1080p60_4500K_veryfastframe3370_720p60_5000K_faster

Conclusions#

This analysis demonstrates that streaming excellent quality video brings with it several demands of great processing power and bandwidth. Twitch's minimum suggested bitrates will result in "fair" quality video, unless you have the computing power to utilize a demanding preset such as faster. Bitrate is clearly an important factor, and makes large differences in perceivable quality.

For streamers that just don't have a lot of bandwidth to work with, it may not make sense to stream at 1080p60, as the quality penalty is quite severe for 4500K and below. In cases such as this, it may be wise to cut framerate to 30fps, or to consider 720p60 or even 720p30 if bandwidth is extremely limited.

Chances are, however, that the bigger limiting factor for most streamers is going to be CPU power. While the veryfast preset has become very common on Twitch, this analysis shows that moving even one preset slower to faster results in a noticeable increase in picture quality. It some cases, it may even in fact be wise to move down from 1080p60 to 720p60 in order to utilize the faster preset, or more difficult ones.

If you're considering what settings to use for your stream, my personal recommendation would be to start by measuring your upload bandwidth, assume that you'll be able to use 80% of that for streaming, and see which of Twitch's recommended maximum bitrates that you'd be able to hit. If you don't have 5Mbps or faster upload bandwidth, you probably won't be able to get high picture quality without dropping to resolutions below the ones we tested.

CPU power cannot be understated. If you don't have an extremely high end setup, you may not want to stream at 1080p60. In fact, unless you can use very high bitrates combined with reasonably difficult presets, you're likely to get better picture quality by lowering the resolution, lowering the framerate, or both.

Update: August 11, 2018:#

NVENC becomes an interesting option for streamers that have a recent nVidia GPU and plenty of bandwidth, but not a lot of CPU power. If you stick the the High Quality preset, and keep bitrate high, you'll get results roughly equal to x264 veryfast, or slightly better.

We also learned that NVENC and x264 have trouble with different types of frames. Even though x264 veryfast and NVENC High Quality are very similar on average, it's possible that certain games "just look better" on one or the other.

In general, if you are able to take advantage of more difficult x264 presets (faster and up), it still seems wise to do so.

Disclaimer: This analysis is not comprehensive. It covers only a specific game (Heroes of the Storm) and analyzes just one minute of gameplay footage. Results may not be applicable to other games, or even to the same game if there is significant variation in game settings, especially graphics and camera settings.