Alpha channel support for VVC/H.266 in vvenc and GStreamer

Written by

Diego Nieto & Andoni Morales

September 2, 2025

Introduction

In July 2024, our STREAM – Scalable Telepresence with Real-time EnhAnced Multimedia project was one of the winners of the SPIRIT - Scalable Platform for Innovations on Real-time Immersive Telepresence Open Call 1. SPIRIT’s mission is to create Europe’s first multisite and interconnected framework capable of supporting a wide range of application features in collaborative telepresence.

The SPIRIT platform provides a toolkit for telepresence, allowing to capture volumetric data and generate videos with 3D avatars that are streamed over the network and rendered on the client in a Virtual Reality environment to provide an immersive experience. The goal of our STREAM project was to reduce bandwidth usage without compromising 4K resolution and quality by leveraging the use of GStreamer, Gst.WASM and VVC/H.266 as a video codec with its support for alpha channel.

As part of the project, we extended VVenc to create a VVC/H.266 encoder with support for alpha channel and we developed its counter part VVC/H.266 decoder with alpha support using VVdec and GStreamer. In this blog post we will explain in detail all the work done, but before that, let’s start with a brief introduction about alpha channel and how it’s supported in VVC/H.266.

Alpha Channel

What is an Alpha Channel?

The Alpha channel is an essential tool in modern video workflows, enabling transparency and visual effects. It is an extra grayscale channel containing additional information about pixel transparency, used to create an alpha mask to overlay videos with transparencies over other content without borders or with a transparent background. It’s used in video production and editing to overlay videos with transparent backgrounds over other videos or on the web to overlay transparent videos over the main content.

In VR applications, like the SPIRIT project one, the alpha channel is used to blend the 3D avatar video with a video from the real world in a VR environment.

How video codecs support Alpha Channel?

Video codecs usually work with YUV colorspaces using 3 channels (1 for luminance and 2 for chrominance), allowing to perform the first compression stage named chroma-subsampling. Additional channels, like the alpha channel or depth channel, needs to be encoded separately. In old video codecs like VP8 and VP9, without support for alpha channel, it’s done through the WebM container, where the alpha channel is encoded as a separated video track and muxed together with the main video track.

Modern video codecs like VVC/H.266 have support for alpha channel and depth channel encoded in the bitstream as auxiliary layers and signalled through specific SEI messages. This allows streaming videos with alpha channel without the need of a muxer.

Alpha Channel in VVC/H.266

VVC/H.266 supports multi-layer profiles (eg: Multilayer Main 10) where additional layers are added to the base layer. This is usually used to provide support for spacial and temporal scalability. A client can decode a base layer for lower resolutions or lower framerates and enhancement layers can be added for higher resolutions, higher framerate or other enhancements. Support for alpha channel uses the multi-layer profile, encoding the extra channel in an independent auxiliary layer.

Apart from encoding the alpha channel as an auxiliary layer, we also need to signal its presence and features in the bitstream, which is done through SEI messages. This SEI signalling is defined in ITU H.274, a standard that was designed for ITU H.266 but with the goal of being reusable by other codecs and applications. It consists of the following 2 SEI messages (abbreviated for clarity).

Scalability Dimension Info - SDI (8.19)

This SEI message provides information for each layer in the current CVS, such as when there are multiple views (the view ID of each layer) or when there is auxiliary information, like the alpha channel in our case, carried by one layer (the ID of the layer containing it):

sdi_auxiliary_info_flag: indicates whether auxiliary layers are present in the bitstream or not.
sdi_aux_id: the id of the auxiliary layer (our alpha layer in this case)
sdi_associated_primary_layer_idx: an array that links this alpha layer with its primary layers.

Alpha Channel Info - ACI (8.23)

This SEI message provides information about the alpha channel properties, like the sample values and postprocessing applied to the decoded alpha planes:

alpha_channel_bit_depth_minus8: the bit depth of the samples of the luma sample array of the auxiliary picture. Must match its associated primary layers.
alpha_transparent_value: specifies the interpretation sample value of a decoded auxiliary picture luma sample for which the associated luma and chroma samples of the primary coded picture are considered transparent for purposes of alpha blending.
alpha_opaque_value: specifies the interpretation sample value of a decoded auxiliary picture luma sample for which the associated luma and chroma samples of the primary coded picture are considered opaque for purposes of alpha blending.

Encoder/Decoder implementation

Now that we have a basic knowledge of how an alpha channel is encoded and signalled in H.266/VVC, let’s jump into how we implemented an encoder to generate our first stream with an alpha channel and finally a decoder capable of decoding it.

VVC encoder with Alpha Channel

The first stage of the development was having an encoder capable of creating an H.266 bistream with an alpha channel as an auxiliary layer and signalled using ACI and SEI messages. For the STREAM project we used the open source VVC/H.266 software encoder VVenc, by Fraunhofer. This encoder implementation does not have support for multi-layer profile or alpha channel.

We needed to work on the following parts on the encoder side:

Multi-layer support
Add support to use an input sequence with the alpha channel and encode it as an independent layer.
Add alpha signalling support through SDI and ACI SEI messages

Multi-layer support in `VVenc`

VVenc’s upstream repository does not have multi-layer support, but we were fortunate that NHK had already added support for multi-layer in their NHK VVenc Multi-layer fork and we could leverage their work without having to start from scratch. NHK’s results are documented in the JVET-AE0172 contribution.

Their work is based on the 1.6.1 release but we needed to work with version 1.12.0, so we had to rebase their contribution on top of 1.12.0. Our work is available in the alpha-sei branch of our VVenc fork.

`VVenc` Alpha Channel encoding

The next step was being able to pass the alpha channel as input and encode it as an independent layer. We decided to reuse the existing multi-layer support used for temporal and spatial scalability and add the alpha layer as a new independent layer.

VVenc uses configuration files allowing to configure different aspects of the encoder. The example multi-layer configuration used for spatial scalability defines 2 dependent layers and it looks like this:

======== Layers ===============
MaxLayers                     : 2
MaxSublayers                  : 7
DefaultPtlDpbHrdMaxTidFlag    : 0
AllIndependentLayersFlag      : 0
#======== OLSs ===============
EachLayerIsAnOlsFlag          : 0
OlsModeIdc                    : 0
NumOutputLayerSets            : 2
NumPTLsInVPS                  : 2
#======== Layer-0 ===============
LayerId0                      : 0
#======== Layer-1 ===============
LayerId1                      : 1
NumRefLayers1                 : 1
RefLayerIdx1                  : 0
#======== OLS-0 ===============
OlsPTLIdx0                    : 0
#======== OLS-1 ===============
LevelPTL1                     : 6.1
OlsPTLIdx1                    : 1

For the alpha channel, we need our extra layer to be independent from the main layer, ending up with the following configuration:

#======== Layers ===============
MaxLayers                     : 2
MaxSublayers                  : 7
AllIndependentLayersFlag      : 1
#======== OLSs =================
EachLayerIsAnOlsFlag          : 1
NumOutputLayerSets            : 2
NumPTLsInVPS                  : 1
#======== Layer-0 ===============
LayerId0                      : 0
#======== Layer-1 ===============
LayerId1                      : 1
#======== OLS-0 ===============
OlsPTLIdx0                    : 0
#======== OLS-1 ===============
OlsPTLIdx1                    : 1

The VVenc multi-layer support requires all layers to use the same colorspace assuming that layers are not independent. This isn’t the case for the alpha channel, which is an independent layer that should be encoded as a single 8-bit grayscale channel. Due to a lack of time to fix it, we decided to encode the alpha channel as I420, where the alpha channel is stored in the luminance channel.

Our plan for the future is to allow the encoder to correctly handle layers with different colorspaces when they are independent and to add support to pass the alpha channel with new properties to differentiate it from dependent layers used for temporal or spatial scalability.

Support for SDI and ACI SEI messages

The last part of the puzzle for the encoder was adding the alpha channel signalling using Scalabity Dimension Info and Alpha Channel Info SEI messages. We implemented them following the ITU H.274 recommentation in the following commits:

We also added 2 configuration properties to control the injection of these messages in the encoder:

#======== Alpha ================
SEIAlphaChannel               : 1
SEIScalabityDimension         : 1

Generating the bitstream

With all that work in place, we are now able to generate an H.266 bitstream with an alpha channel using Layer 0 for the main sequence and Layer 1 for the alpha channel

./vvencFFapp \
  -c ../../cfg/two_layers_gray.cfg \
  -c ../../cfg/randomaccess_fast.cfg \
  -l0 -c sequence2K.cfg \
  -l1 -c sequenceAlpha.cfg \
  -l0 -q 22 \
  -l1 -q 22 \
  -l1 --Level=6.1 --VerCollocatedChroma=1 -v 6

VVC/H.266 decoder with Alpha Channel

The final stage of the work was creating a decoder capable of decoding VVC/H.266 bitstreams with alpha channel. This decoder would need the ability to:

Parse SEI and ACI SEI messages to identity the presence of an alpha channel and its layer.
Decode the main layer and the alpha channel layer independently.
Blend the alpha channel mask from the alpha decoder’s output into the main decoder’s output stream to generate an AYUV stream.

We analyzed the following 2 implementation options:

Add support directly in VVdec.
Create a decoder wrapper that uses internally two H.266 decoders.

We decided to go for option 2, since it would allow us to reuse it with other decoder implementations, like hardware decoders, and not just VVdec’s software decoder. This choice was also easier to implement since we could reuse the same infrastructure created in GStreamer for VP8/VP9 alpha channel support.

GStreamer VP8/VP9 Alpha Channel decoders

GStreamer has support for alpha channel in VP8/VP9 decoders through the vp9alphadecodebin and vp8alphadecodebin elements, based on alphadecodebin. In VP8/VP9, the alpha channel is muxed as an extra video track in the WebM container. When such stream exists, the GStreamer demuxer (matroskademux) does not expose a new pad for it, it instead attaches the alpha AU NAL’s from that stream as GstVideoCodecAlphaMeta to the output AU NAL buffers from the main stream. A stream with an alpha channel is identified in the caps with the field codec-alpha, allowing to select decoders with alpha support over the ones that do not support it when it’s set to true.

The VP8/VP9 alpha decoders are bins that uses the GstVideoCodecAlphaMeta meta to split the stream again into 2 streams. Then, 2 decoders are used to decode independently the main stream and the alpha channel one. Finally, an element combines back the 2 decoded raw buffers into a single stream, so that a downstream compositor can blend the alpha channel.

The decoders are bins composed of the following elements:

codecalphademux: splits the incoming buffers with the GstVideoCodecAlphaMeta into 2 output streams, one with the original buffer from the main stream and another one with the alpha channel buffers.
vp8dec or vp9dec: 2 decoder instances that decode the main video track and the alpha one
alphacombine: combines back the 2 decoded raw buffers into a single.

GStreamer H.266/VVC Alpha Channel decoders

For alpha channel support in H.266/VVC, we decided to reuse the same infrastructure using h266parse for the alpha channel parsing and processing and with a new h266alphadecodebin element. The pipeline for the decoder looks like this:

The first part of the work was to add support for h266parse to read ACI and SDI SEI messages in order to detect the alpha channel auxiliary layer. Once we had that piece of work made, the next part was to attach the alpha channel AU’s from the auxiliary channel as GstVideoCodecAlphaMeta to the output buffers of h266parse, so that an element based on GstAlphaDecodeBin could handle the decoding of the alpha channel. Finally, we have created a new element named h266alphadecodebin, based on GstAlphaDecodeBin with some modifications to the base class to support the VVC/H.266 use case. h266alphadecodebin will use the highest ranked VVC/H.266 decoder, we are using Fluendo’s H266 software decoders but vvdec will work as well. Hardware decoders could also work, but we didn’t test them.

All of our work for the decoder is now upstream in the “h266 - Parse and decode bitstreams with alpha layers” pull request.

Demo

Let’s now see a working demo in which we will overlay a VVC/H.266 over another video stream making use of the alpha channel.

Setup the streams

For our demo we will need 2 streams, the raw I420 video and a raw alpha channel also in I420 format due to existing limitation in the encoders we mentioned earlier in the blogpost.

The alpha channel stream can be simulated using a chroma keying technique. The results are not as good as if this information comes directly from our producer source but it will help us for demo purposes.

This the video that we will use as source with a chroma key background. Orignal

For the source stream, we will start by storing it as raw I420 to feed it into our encoder, by decoding and saving it to a file h266.yuv

gst-launch-1.0 filesrc location=source-1920x1080.mp4 ! decodebin3 ! videoconvert ! "video/x-raw,format=I420" ! filesink location=h266.yuv

From this video, we will extract the alpha channel, but instead of saving it as 8-bit greyscale, we will save as I420, using the luma channel to store the alpha channel. We will use ffmpeg’s alphaextract filter for that:

ffmpeg -f rawvideo -pix_fmt rgba -s 1920x1080 -i file.rgba -framerate 30 -vf alphaextract  -pix_fmt yuv420p output_alpha.yuv

We now have two streams that we will use as input for our encoding process.

Encoding process

Once we have the previous streams ready, we will use vvencFFapp, a CLI application frontend for VVenc, to encode our VVC/H.266 with alpha chanel.

We will use the following configuration files:

two_layers_alpha.cfg: configures the encoder to use a main layer and an auxiliary layer for the alpha channel
randomaccess_fast.cfg: configures the encoder for fast random access with a shorter GOP for a live streaming application.
sequence2K.cfg: configures the input for the main layer
sequenceAlpha.cfg: configures the input for the alpha layer

This is an example of the two_layers_alpha.cfg configuration:

#======== Layers ===============
MaxLayers                     : 2
MaxSublayers                  : 7
AllIndependentLayersFlag      : 1
#======== Alpha ================
SEIAlphaChannel               : 1
SEIScalabityDimension         : 1
#======== OLSs =================
EachLayerIsAnOlsFlag          : 1
NumOutputLayerSets            : 2
NumPTLsInVPS                  : 1
#======== Layer-0 ===============
LayerId0                      : 0
#======== Layer-1 ===============
LayerId1                      : 1
#======== OLS-0 ===============
OlsPTLIdx0                    : 0
#======== OLS-1 ===============
OlsPTLIdx1                    : 1

And this how we run the application:

./vvencFFapp \
  -c ../../cfg/two_layers_alpha.cfg \
  -c ../../cfg/randomaccess_fast.cfg \
  -l0 -c sequence2K.cfg \
  -l1 -c sequenceAlpha.cfg \
  -l0 -q 22 \
  -l1 -q 22 \
  -l1 --Level=6.1 --VerCollocatedChroma=1 -v 6

In the encoder summary logs, we can see the stats of the 2 encoded layers, with 150 frames each:

LayerId  0

vvenc [info]: SUMMARY --------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR   Y-Lossless  U-Lossless  V-Lossless
vvenc [info]:         150    a0     80.5984   62.6694      -nan      -nan   64.0628           0         150         150


vvenc [info]: I Slices--------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR   Y-Lossless  U-Lossless  V-Lossless
vvenc [info]:           4    i0    749.8200   70.1888      -nan      -nan   71.9405           0           4           4


vvenc [info]: P Slices--------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
vvenc [info]:           0    p0        -nan      -nan      -nan      -nan      -nan


vvenc [info]: B Slices--------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR   Y-Lossless  U-Lossless  V-Lossless
vvenc [info]:         146    b0     62.2636   62.4634      -nan      -nan   63.9643           0         146         146
vvencFFapp [details]: Bytes written to file: 54402 (inf kbps)

LayerId  1

vvenc [info]: SUMMARY --------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR   Y-Lossless  U-Lossless  V-Lossless
vvenc [info]:         150    a1     29.3376   71.6481      -nan      -nan   72.7819           0         150         150


vvenc [info]: I Slices--------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR   Y-Lossless  U-Lossless  V-Lossless
vvenc [info]:           4    i1    203.7600   82.0603      -nan      -nan   83.8188           0           4           4


vvenc [info]: P Slices--------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
vvenc [info]:           0    p1        -nan      -nan      -nan      -nan      -nan


vvenc [info]: B Slices--------------------------------------------------------
vvenc [info]:   Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR   Y-Lossless  U-Lossless  V-Lossless
vvenc [info]:         146    b1     24.5589   71.3628      -nan      -nan   72.6736           0         146         146
vvencFFapp [details]: Bytes written to file: 22369 (inf kbps)

vvencFFapp [info]: finished @ Fri Apr 25 14:49:49 2025
vvencFFapp [info]: Total Time:       96.580 sec. [user]       98.361 sec. [elapsed]

Decoding process

Let’s now use our newly created VVC/H.266 stream with alpha channel to overlay it on top of another video. Here is the GStreamer command-line used, with the pipeline representation:

gst-launch-1.0                                                           \
  filesrc location=aplphastream.vvc !                                    \
  h266parse ! h266alphadecodebin ! queue ! mixer.sink_0                  \
  uridecodebin uri=file:///Path/To/fluendo.mp4 !                         \
  queue !  mixer.sink_1                                                  \
  mixer.sink_1 compositor name=mixer sink_0::zorder=0 sink_1::zorder=1 ! \
  playsink                                                               \

Our new h266alphadecodebin element will decode the main and the alpha layer and combine them into a single GstBuffer in AYUV format. We use the video stream fluendo.mp4 as a background and let the mixer combine it with out decoded VVC/H.266 with alpha:

Acknowledgments

Fluendo was granted European Funds to collaborate in developing a Real-time Enhanced Multimedia system. One of the project’s aims was to support Alpha Channel in VVC streams. We developed our solution based on the NHK for multi-layer support and Fraunhofer HHI guidance.

Get in Touch

At Fluendo, we specialize in developing high-performance video streaming solutions tailored to your specific needs. Whether you’re looking for low-latency streaming, multi-camera support, or scalable embedded solutions, our expert team is ready to help.

Want to learn more or discuss your project? Contact us today, and let’s build the future of video streaming together!