Putting closer Fluendo Hardware decoders to the GStreamer community memories

Written by

Diego Nieto

June 29, 2023

Fluendo hardware decoders are widely used across our different clients. To archive direct rendering, these plugins output a custom Fluendo memory format compatible with our powerful sink elements. However, this custom format is not compatible with other GStreamer community elements for direct rendering. After carefully analyzing how community backends interact with other community sinks, we decided to go further in our decoders to provide compatibility with the community. As a first product iteration, we target community OpenGL memory in Linux for VDPAU and VAAPI. But other backends, such as DXVA2, could also be added in the future.

FluHWVAUpload

At Fluendo, we are committed to delivering cutting-edge multimedia solutions that empower developers and users alike. That’s why we are excited to introduce our latest addition to our GStreamer codecs family: FluHWVAUpload. This powerful new GStreamer element seamlessly makes compatible Fluendo hardware decoders with OpenGL GStreamer elements through GStreamer GLMemory. With support for both VAAPI and VDPAU backends, FluHWVAUpload opens the doors to the most used vendors, such as Intel, AMD, and NVIDIA. Moreover, by leveraging GStreamer GL elements, FluHWVAUpload paves the way for future integration with other community memories like DMABuf.

Plugin features

Unleashing the Potential of Fluendo Hardware Decoders: FluHWVAUpload seamlessly takes VDPAU or VAAPI surfaces from GPU devices without system memory copies, providing highly efficient GStreamer elements connection.

VAAPI

The GStreamer community already provides elements that support this backend. VA is chosen to become the replacement for the VAAPI one. Both support different memory layouts as a source pad, but the common one is DMABuf. However, DMABuf by itself only provides value for sink purposes for some of our use cases. So, it’s here where the glupload element joins into the game. It is responsible for showing DMA buffers as OpenGL memory, which later might be rendered with a glimagesink element.

Detailed process:

In step one, we check buffer metadata is correct, ensuring we are handling an I420 VAAPI surface. That surface is located in the GPU.
After that, we create an RGBx VAAPI surface and take its corresponding DMA buffer pointer.
Once we have that, we are ready to render the incoming I420 surface.
Now, the surface is wrapped, with no memory movement, into an EGLImage.
EGLImage has a property that lets us handle the data as a texture; glEGLImageTargetTexture2D does that binding.
Now the texture is ready to be used by other OpenGL elements.
Once the texture object is no longer used, the texture is unmapped, and the VAAPI surface is destroyed

Both backends push one RGBx OpenGL texture per frame, which is 100% compatible with the glimagesink element. This effort to turn the original surface, separated in several planes, into a single one allowed us to overcome the community plugin’s problems (e.g., NV12 format with GLES API).

This is how the pipeline looks like:

VDPAU

Taking advantage of the NVIDIA VDPAU-OpenGL interoperability API, we deliver compatibility support for the VDPAU backend. This memory map it’s based on the EGL platform, but instead of directly using the OpenGL API, a mix of OpenGL textures allocation and VDPAU interoperability mapping is required.

Detailed process:

In step one, we check that buffer metadata is correct, ensuring we handle an I420 VDPAU surface. That surface is located in the GPU.
After, we set up an OpenGL buffer that allocates a texture.
Then, we render the incoming VDPAU surface, separated in several planes, into an RGBx, in just one plane. That allows us to register later and map the new surface through VDPAURegisterOutputSurfaceNV and VDPAUMapSurfacesNV. The latter links the VDPAU surface with the OpenGL texture.
Other OpenGL elements, such as glimagesinkelement can use texture.
Once the texture object is no longer used, the texture is unmapped, and the VDPAU surface is destroyed.

Future

DMABuf is the top GStreamer memory for high-performance transactions. It avoids involving the CPU for memory transitions, letting the GPU take the memory by itself, resulting in fast operations. It can be later used with other memories, as we saw here for VAAPI in our design. This kind of memory is also implemented by the Linux kernel, depending on the vendor SoC, but the common ones provide drivers. There is no need to perform buffer copies for devices that share physical memory between CPU and GPU, UMA devices.

Final thoughts

In summary, we are proud to say FluHWVAUpload has come to Fluendo to empower Fluendo Hardware decoder features, following the standards of what the GStreamer community does and opening the door to be transformed into the element that continues evolving to provide new decoder functionalities. Fluhwupload seamlessly bridges the gap between Fluendo hardware decoders and OpenGL GStreamer memories, ensuring optimal performance and compatibility across various platforms.