Image superresolution with GStreamer: Cross-platform Multimedia Edge AI with GStreamer

Written by

Izan Leal

November 13, 2024

The rise of Artificial Intelligence (AI) in multimedia has revolutionized tasks like facial recognition and object detection, making them essential features in modern applications. As these capabilities increasingly shift to the edge, achieving smooth, real-time execution is critical. The real challenge lies in delivering custom, accurate AI solutions that perform consistently across diverse edge environments, ensuring reliable results on varying hardware platforms.

As multimedia experts, Fluendo works with GStreamer, an open-source framework that allows efficient multimedia processing. It ensures real-time streaming, synchronization, and integration of diverse media formats across platforms. Its pipeline-based architecture and support for hardware acceleration are perfectly suited for vision and audio AI tasks, providing a perfect integration of media processing with a Deep Learning model’s inference.

In order to ease the development of custom AI-based solutions based on GStreamer, in Fluendo we have created Fluendo AI Plugins, a set of GStreamer elements providing support for common vision and audio tasks such as face detection, tracking or segmentation. Our plugins leverage our in-house AI Engine (Raven AI Engine), which allows us to create custom, boosted and user-friendly AI solutions**.**

Superresolution with AI

Functionality description

Image super-resolution (SR) is a technique designed to enhance the resolution of digital images, making it a crucial tool for medical imaging, satellite imagery, and entertainment applications. The process involves expanding the dimensions of an image and precisely reconstructing new pixels to fill the gaps between the original pixels, resulting in much more precise, high-quality visuals.

Figure 1: Upscaling pixels diagram

Traditional upscaling methods estimate these new pixel values using basic mathematical models, such as nearest-neighbour and bilinear interpolation. While these approaches can expand image size, they often result in noticeable artifacts, blurred details, and pixelation, especially with higher upscaling factors.

Figure 2: Bicubic interpolation upscaling

AI-based image super-resolution techniques have improved visual quality over the years, thanks to advanced deep learning algorithms like convolutional neural networks (CNNs) and generative adversarial networks (GANs). These advanced models are trained on extensive datasets to learn complex patterns and textures, enabling them to generate high-resolution images with exceptional fidelity and realism, details that traditional upscaling methods often miss

By accurately capturing intricate details and contextual information, AI-driven SR methods address the shortcomings of traditional techniques, delivering superior image quality and enhanced clarity.

Figure 3: AI upscaling VS bicubic interpolation upscaling

The AI-based upscaling method consistently produces visually appealing results than traditional mathematical approaches. When evaluated using the Natural Image Quality Evaluator (NIQE), the AI model scored 4.9, while the bicubic interpolation method scored 7.5. Since lower NIQE values indicate better quality, these results clearly demonstrate the superiority of AI-driven upscaling over conventional techniques.

AI Plugin description

To harness the power of AI-based super-resolution (SR) technology, we have developed a GStreamer plugin that leverages AI upscaling to transform low-resolution video into high-resolution quality. Although the SR technology is not new, this marks the first-ever GStreamer plugin* designed specifically to automate and customize image super-resolution.

Our plugin employs GAN-based models capable of performing x1(deblur)/x2/x3/x4 upscaling on the input image. We offer a range of models tailored to meet different needs. Some are optimized for maximum precision with highly accurate results, while others are more lightweight, delivering faster processing speeds at the cost of a slight reduction in accuracy This flexibility allows users to choose the model that best fits their specific requirements, whether they need maximum accuracy or quicker performance.

The GAN-based architecture in our plugin features two interconnected neural networks: the generator and the discriminator. The generator creates high-resolution images from low-resolution inputs, while the discriminator differentiates between the original high-resolution images (GT) and those produced by the generator.

During the training phase, errors from both the generator and discriminator are backpropagated to enhance and fine-tune the models. Only the generator is employed in the final application, delivering efficient and high-quality video upscaling.

* To the best of our knowledge and on August 20th 2024.

Figure 4: Upscaling GAN architecture

Plugin Documentation

Our GStreamer plugin is accompanied by comprehensive documentation and detailed code examples to facilitate seamless integration and usage. The documentation provides clear guidance on installation, configuration, and customization, while the code examples demonstrate various use cases and best practices for effectively leveraging the plugin’s capabilities. This ensures that users can quickly understand and implement the plugin, optimizing their workflow and maximizing the benefits of advanced image upscaling technology.

Pre-requisites:

This plug-in requires a complete installation of GStreamer MSVC 64-bit version 1.22.3 or above.

Installation:

Inside the bin directory of the provided package, you’ll discover all the necessary binaries. To ensure recognition by GStreamer, these files must be copied to the GStreamer plugins directory. The location of this plugin directory may differ according to your specific GStreamer installation.

As an example, when installing GStreamer v.1.24.4 on Windows 10, the default directory is:

C:\gstreamer\1.0\msvc_x86_64\lib\gstreamer-1.0

Alternatively, you can set the GST_PLUGIN_PATH environment variable to include the complete path to the bin directory within the package. Also, ensure that the PATHenvironment variable contains the complete path to the plugin’s bin directory so Windows can find the required .dll files.

AI models location:

Keeping the AI models (*.flu files) in the same directory as the plugin binaries is not mandatory. If you prefer to relocate them elsewhere, set the element property model-path to specify their location.

Simple Webcam Example:

gst-launch-1.0.exe mfvideosrc ! video/x-raw,width=640,height=480 ! videoconvert ! d3d11upload ! fluupscale4x ! autovideosink

Figure 5: Webcam x4 upscaling example

Conclusions

In conclusion, our AI-driven upscaling model combines flexibility and precision, setting a new benchmark in image enhancement to meet the rising quality standards demanded by the multimedia industry. As the first of its kind, this unique GStreamer AI upscaling plugin significantly enhances visual quality and offers a competitive advantage. It positions itself as an essential tool for anyone seeking to elevate their video and image processing capabilities, making it a standout solution in the market.

Ready to enhance your images? Contact us to learn how Fluendo’s AI plugins for GStreamer can seamlessly integrate into your workflows.