HYPE: HYbrid Parallel Encoder

Written by

Rubén González

November 2, 2023

It is nothing new to say that multimedia content does not stop growing. VOD streaming, live broadcasting, videoconferencing, trending short-form content, and personal recording are omnipresent. We all enjoy it, but it also presents challenges we should not forget. For instance, a trending challenge during the last few years is the carbon emissions from streaming.

Half an hour of ‘Netflix and Chill’ Emits The Same Amount of CO2 as a 6 km Drive .

We must work on multimedia standards, implementations, and processes that help overcome these challenges

Hype in Fluendo R&D

With this challenge in mind, one of the first ideas that emerged from the Fluendo lab was to improve how videos are encoded using all the resources new hardware can provide to reduce the time needed.

The Problem: Video encoders parallelization with high core density and multi-GPU.

The Fluendo lab team implemented a prototype in Python to show the potential of the idea.

Hype in Innovation Days

Fluendo has many Rust lovers, and we take advantage of the Innovation Days to propose porting Hype from Python to Rust. The goal was to test the language with a real idea and share the language advantages inside the team.

Rust. Fast, safe, and productive - pick three.

Another goal of the initiative was to publish Hype as an open-source solution. The Python implementation was excellent for the prototype, but a Rust implementation would be more aligned with the community.

The Problem

CPU’s core count is increasing, dual GPU is becoming mainstream (Intel iGPU/dGPU, AMD AAA), and multi-GPU setups are more common. The encoder’s parallelization doesn’t scale linearly with a high density of cores or several GPUs. And VOD encoding demands more and more speed and using all the available resources. More parallelization is available and can be used.

Encoder Parallelization types

Parallelism	Pros and Cons
Frame level	It can only be used for certain GOP structures (non-B-frames)
Slice level	Scaling is limited to a single frame.
Tile level	Similar parallelization level as slice level.
Wavefront Parallel Processing (WPP)	Only available in HEVC, VVC…
Time-slicing (fixed duration)	Sub-optimal rate control due to IDR frames forced at a given interval
Time-slicing (scene/GOP)	Extremely high parallelization, constant quality per scene, convex-hull encoding

Features

Hype is a Meta encoder.
It’s codec agnostic and can support a wide variety of codecs. As H.264, H.265, AV1…
It’s hybrid, supporting hardware and software encoders.
It can use all the resources of a machine, mixing GPUs and CPU.
It parallelizes VOD encoding, increasing encoding speeds.
Based on time-slicing parallelization (fixed duration and scene/GOP).

Performance

For local testing, a one-thread H.264 encoder is used. As seen in the following table, performance improves linearly when adding encoders, reaching a limit due to the requirement of large amounts of memory.

# encs	Elapsed (wall clock) time	Percent of CPU this job got	Maximum resident set size (kb)
1	1:03.73	103%	80_612
2	0:40.89	202%	139_984
3	0:24.09	302%	199_552
4	0:20.34	395%	258_952

GStreamer implementation

Take a look at the repository on GitHub.

Conclusions

The fact that the implementation is agnostic is a great success, and this solution is valid for other processes with high computational load. For example, the implementation of a transcoder instead of an encoder would have several advantages, such as the ability to use zero-copy, taking advantage of the decoder memory directly in the encoder behind the same hardware, and producing a very significant reduction in the necessary memory.

One of the biggest problems that this architecture has is equaling the output quality for different encoders since not all hardware has the same multimedia capabilities. The project was presented at the GStreamer conferences held in A Coruña during September of this year. If you want to learn more about our capabilities, please do not hesitate to contact us here.