Enhancing video quality assessment in GStreamer - Encoder statistics and VMAF integration

Written by

Diego Nieto

March 3, 2026

Introduction

In the modern landscape of digital media, video quality assessment (VQA) is the cornerstone of effective encoding workflows. Understanding the quality degradation introduced by lossy compression helps developers optimize encoding parameters and ensure acceptable visual quality for end users.

At Fluendo, the Innovation Days represent a great opportunity to enhance GStreamer capabilities within a 3-day code marathon. We focused on measuring video quality using the new GStreamer VMAF (Video Multi-Method Assessment Fusion) element while reducing resource requirements. The team, composed of Carlos Falgueras García, Alexander Slobodeniuk, and Diego Nieto, worked together on this topic.

This blog post details our work on extending GStreamer’s GstVideoEncoder base class to expose reconstructed frames and original buffers, implementing this functionality in specific encoder plugins, and integrating it with the VMAF element for seamless quality assessment.

Targeting our problem

In order to measure video quality through VMAF or any other reference video metric, we need to provide both the reference video and the distorted one simultaneously.

As shown in the previous diagram, we need to decode the just-encoded frame for the distorted pad of the VMAF element. This introduces additional resource consumption—the decoder—which we want to eliminate when possible. And the question is: is it possible? In most cases, yes, it is feasible. Here, we will showcase one approach. ``

The Challenge: Using reconstructed frames

What are reconstructed frames?

Video encoders work as follows:

Take the original frame
Apply various transformations (DCT, quantization, etc.)
Reconstruct the frame from the compressed data
Use this reconstructed frame as a reference for encoding subsequent frames

The reconstructed frame is what the decoder will eventually produce—it’s the compressed version that went through the encode/decode cycle. Comparing the original frame with the reconstructed frame allows us to measure the quality loss introduced by the encoding process.

Solution architecture

Our solution consists of three main components:

GstVideoEncoder base class extensions

We extended the GstVideoEncoder base class with new API to support:

Extracting reconstructed frames from encoders
Preserving original input buffers
Attaching both as metadata to encoded frames

Encoder-specific implementations

We implemented reconstructed frame extraction for specific encoders:

x264 encoder (x264enc)

VMAF integration

We enhanced the VMAF element to consume the attached metadata, eliminating the need for a separate decoding pipeline.

Implementation details

Task 1: GstVideoEncoder base class API

The first step was extending GstVideoEncoder with new properties and infrastructure to handle reconstructed frames and original buffers.

New video metas

We added the GstReconFrameMeta, which is a new meta that holds the reconstructed frame provided by the encoder.

typedef struct _GstReconFrameMeta
{
  GstMeta meta;
  GstVideoInfo vinfo;
  GstBuffer *frame;
} GstReconFrameMeta;

Base class reconstructed frame support

The GstVideoEncoder base class now holds a property enabled by the different implementations that allows attaching the reconstructed frame data to the encoder output buffers when calling gst_video_encoder_finish_frame(). If the reconstructed frame is enabled, that data is attached to the output encoded buffer.

static void
gst_video_encoder_add_recon_frame (GstVideoEncoder * encoder,
    GstVideoCodecFrame * frame)
{
  GstBuffer *output_buffer = frame->output_buffer;

  if (!frame->recon_frame)
    return;

  GstReconFrameMeta *rfmeta = (GstReconFrameMeta *)
      gst_buffer_add_meta (output_buffer, GST_RECON_FRAME_META_INFO, NULL);

  gst_video_info_from_caps (&rfmeta->vinfo, encoder->priv->output_state->caps);
  rfmeta->frame = gst_buffer_ref (frame->recon_frame);

  GST_DEBUG_OBJECT (encoder,
      "Added GstReconFrameMeta %" GST_PTR_FORMAT " to buffer %" GST_PTR_FORMAT,
      rfmeta->frame, output_buffer);
}

You can find the implementation details of this step here.

Task 2: x264 reconstructed frame implementation

After establishing the base class API, we needed to implement reconstructed frame extraction in actual encoder plugins. We evaluated both OpenH264 and x264 encoders and chose to implement x264 because it is widely used and more efficient than OpenH264.

x264 implementation

The x264 encoder provides reconstructed frames through its x264_image_t img output structure.

We first need to enable full reconstruction on the encoder side:

encoder->x264param.b_full_recon = 1;

Once we have it enabled, we can copy those reconstructed buffers during the encoding process:

// COPY Y
for (int p = 0; p < 1/*pic_out.img.i_plane*/; p++) {
  int w = width / nn[p];
  int h = height / nn[p];
  
  for (int y = 0; y < h; y++) {
    memcpy (dst, pic_out.img.plane[p] + pic_out.img.i_stride[p] * y, w);
    dst += w;
  }
}

// COPY UV
{
  int p = 1;
  int w = width / nn[p];
  int h = height / nn[p];

  fff_plane_copy_deinterleave_c (dst, w, dst + w*h, w, 
      pic_out.img.plane[p], pic_out.img.i_stride[p], w, h);
}

You can find the implementation details of this step here.

Task 3: VMAF integration

The final piece was modifying the VMAF element to consume the attached metadata instead of requiring separate input pads for reference and distorted videos.

Original VMAF element design

Previously, the VMAF element required two input pads:

# Old approach - required decoded stream
gst-launch-1.0 \
  filesrc location=original.yuv ! rawvideoparse ! vmaf.ref_sink \
  filesrc location=encoded.h264 ! h264parse ! h264dec ! vmaf.dist_sink \
  vmaf name=vmaf ! fakesink

New single-pad design

With our encoder metadata approach:

# New approach - single pipeline
gst-launch-1.0 \
  filesrc location=original.yuv ! rawvideoparse ! \
  x264enc enable-recon-frame=true enable-original-buffer=true ! \
  vmaf ! fakesink

VMAF element modifications

We modified the VMAF element to detect and use the metadata. We first defined a new pad, the encoded one. This pad contains encoded buffers with both reconstructed metadata and original buffers. Instead of using the reference pad and the distorted pad, we use a new one that comes with everything we need to perform VMAF analysis.

static GstStaticPadTemplate enc_factory = GST_STATIC_PAD_TEMPLATE ("enc_sink",
    GST_PAD_SINK,
    GST_PAD_REQUEST,
    GST_STATIC_CAPS (ENC_FORMATS))

Then, we retrieve the meta buffers from the incoming encoded frames:

static gboolean
gst_vmaf_get_frames_from_enc_pad (GstVmafPad * vmaf_pad)
{
  GstVmaf *self = GST_VMAF (GST_PAD_PARENT (vmaf_pad));
  GstBuffer *buffer;
  GstReconsFrameMeta *recons_meta;
  GstOriginFrameMeta *origin_meta;

  GST_DEBUG_OBJECT (self, "Processing encoded pad: %s",
      GST_PAD_NAME (GST_PAD (vmaf_pad)));

  buffer =
      gst_video_aggregator_pad_get_current_buffer (GST_VIDEO_AGGREGATOR_PAD
      (vmaf_pad));

  if (!buffer) {
    GST_WARNING_OBJECT (self, "No buffer available on encoded pad");
    return FALSE;
  }
  
  // Get video info from the distorted frame meta
  recons_meta = (GstReconsFrameMeta *) gst_buffer_get_meta (buffer,
      GST_RECONS_FRAME_META_API_TYPE);
  if (!recons_meta) {
    GST_ERROR_OBJECT (self, "No GstReconFrameMeta found in buffer");
    return FALSE;
  }
  if (!gst_video_frame_map (&self->dist_frame, &recons_meta->vinfo,
          recons_meta->frame, GST_MAP_READ)) {
    GST_ERROR_OBJECT (self, "Failed to map distorted frame from meta");
    gst_video_frame_unmap (&self->dist_frame);
    return FALSE;
  }
  
  // Get video info from the reference frame meta
  origin_meta = (GstOriginFrameMeta *) gst_buffer_get_meta (buffer,
      GST_ORIGIN_FRAME_META_API_TYPE);
  if (!origin_meta) {
    GST_ERROR_OBJECT (self, "No GstOriginFrameMeta found in buffer");
    return FALSE;
  }
  if (!gst_video_frame_map (&self->ref_frame, &origin_meta->vinfo,
          origin_meta->frame, GST_MAP_READ)) {
    GST_ERROR_OBJECT (self, "Failed to map reference frame from meta");
    gst_video_frame_unmap (&self->ref_frame);
    return FALSE;
  }

  GST_DEBUG_OBJECT (self,
      "Successfully mapped reference and distorted frames from encoded pad");

  return TRUE;
}

In this way, we are able to run the VMAF process without the need for any additional decoding process.

You can find the implementation details of this step here.

Conclusion and next steps

During these days, we did a great job ensuring the project’s feasibility and proving its efficiency. The next steps will focus on polishing the current implementations and opening the corresponding merge request in the GStreamer community. Additional encoders such as openh264enc and the NVIDIA encoder implementations could be added to the list.

Get in touch

At Fluendo, we specialize in developing high-performance video streaming solutions and extending GStreamer’s capabilities for custom use cases. Whether you need encoder optimization, quality assessment integration, or custom GStreamer elements, our expert team is ready to help.

Interested in implementing advanced video quality metrics in your pipeline? Contact us today, and let’s discuss how we can optimize your video workflows!

Introduction

Targeting our problem

The Challenge: Using reconstructed frames

What are reconstructed frames?

Solution architecture

Implementation details

Task 1: GstVideoEncoder base class API

New video metas

Base class reconstructed frame support

Task 2: x264 reconstructed frame implementation

x264 implementation

Task 3: VMAF integration

VMAF element modifications

Conclusion and next steps

Get in touch

References