Real-time AI Background Removal with Gstreamer

Written by

Aleix Figueres

October 29, 2024

Cross-platform Multimedia Edge AI with GStreamer

The rise of Artificial Intelligence (AI) in multimedia has revolutionized tasks like facial recognition and object detection, making them essential features in modern applications. As these capabilities increasingly shift to the edge, achieving smooth, real-time execution is critical. The real challenge lies in delivering custom, accurate AI solutions that perform consistently across diverse edge environments, ensuring reliable results on varying hardware platforms.

Fluendo is a multimedia expert who works with GStreamer. GStreamer is an open-source framework that allows efficient multimedia processing. It ensures real-time streaming, synchronization, and integration of diverse media formats across platforms. Its pipeline-based architecture and support for hardware acceleration are perfectly suited for vision and audio AI tasks, providing a perfect integration of media processing with a Deep Learning model’s inference.

To facilitate the development of custom AI-based solutions based on GStreamer, Fluendo has created Fluendo AI Plugins, a set of GStreamer elements providing support for common vision and audio tasks such as face detection, tracking, or segmentation. Our plugins leverage our in-house AI Engine (Raven AI Engine), which allows us to create custom, boosted, and user-friendly AI solutions.

Background Removal with AI

Background removal is a powerful technique that isolates the main subjects of an image, separating them from their surroundings to create a clean, transparent, or plain background. This process highlights the subjects as the primary focus, making it essential for applications where image or video backgrounds need modification.

t is incredibly useful for applications where the background of an image or video needs to be modified. For instance, in modern video calls, the background can be blurred to reduce distractions or replaced with a solid color, such as a monochromatic layout, or even a different image entirely. This feature helps maintain privacy, create a professional appearance, or align with branding needs during virtual meetings.

Figure 1: Background removal examples

Historically, background removal was initially achieved using chroma-based background removal, which involved using a uniformly colored background, typically green, to separate subjects from their surroundings. This process allowed for easy identification and replacement of the colored backdrop in post-processing.

While still widely used in fields like TV broadcasting (such as weather reporting), this method has some limitations. Its accuracy can be compromised in environments where the subject or surrounding elements share a similar color to the chroma key background, leading to issues like the ‘chroma spill effect,’ where parts of the subject blend into the background, causing transparency or incomplete removal.

Figure 2: Traditional chroma-based method chroma spill effect

With the rise of AI, background removal technology applications have significantly advanced to meet the demands of modern applications like video meetings, virtual events, and augmented reality experiences.

Deep learning technology, for example, has made a big difference in how background removal works. Unlike older methods that depended on simple color differences or manual settings, deep learning uses powerful algorithms like convolutional neural networks (CNNs) to analyze minute details in images and effortlessly separate the subject from complex backgrounds.

AI-based background removal delivers cleaner and more precise results, even in challenging scenarios involving fine hair, transparent objects, or varying lighting conditions. The result is cleaner and more professional images, crucial in live video calls or virtual events where everything needs to look just right in real-time.

Figure 3: AI-based background removal application

AI Plugin description

We have developed an AI-powered GStreamer plugin that utilizes a CNN-based model to accurately extract a person’s upper body mask. This plugin streamlines and customizes the upper body segmentation process, making it highly effective for various applications such as virtual backgrounds and augmented reality experiences.

Our plugin employs a CNN architecture trained to accurately identify and isolate the upper body in video streams. The model generates precise masks focusing on the upper body, seamlessly separating it from the surrounding background or clean, professional visuals.

Figure 4: Plugin architecture

In the diagram above, you can see how our plugin works to change the background of a video. It analyzes the video input to detect the person’s upper body and creates a mask that isolates the individual. This mask is then applied to the original video, removing everything except the person’s silhouette, enabling smooth background replacement or blurring.

Initially, we aimed to integrate this functionality into a ready-to-use GST Plugin and make it accessible for various applications. We have successfully achieved this objective by creating our Background Removal GST, a plugin that allows users to change or blur backgrounds during video meetings, enhancing privacy without the necessity of a physical green screen.

This plugin provides a distraction-free environment by seamlessly blending a new background onto the extracted mask. Content creators can also use it to swap backgrounds in live streams or recorded videos, adding a polished, professional touch without the hassle of extensive post-production editing. From here, the advantages are infinite.

Plugin Documentation

The plugin has easy-to-follow instructions and helpful code examples to simplify integration. The documentation covers installation, setup, and customization, while the examples show how to use the plugin effectively. This helps users quickly understand and start using the plugin, improving their workflow and taking full advantage of advanced image upscaling features.

Pre-requisites:

This plug-in requires a complete installation of GStreamer MSVC 64-bit version 1.22.3 or above.

Installation:

Inside the bin directory of the provided package, you’ll discover all the necessary binaries. To ensure recognition by GStreamer, these files must be copied to the GStreamer plugins directory. The location of this plugin directory may differ according to your specific GStreamer installation.

As an example, when installing GStreamer v.1.24.4 on Windows 10, the default directory is:

C:\gstreamer\1.0\msvc_x86_64\lib\gstreamer-1.0

Alternatively, you can set the GST_PLUGIN_PATH environment variable to include the complete path to the bin directory within the package. Also, ensure that the PATH environment variable contains the complete path to the plugin’s bin directory so Windows can find the required .dll files.

AI models location:

Keeping the AI models (*.flu files) in the same directory as the plugin binaries is not mandatory. If you prefer to relocate them elsewhere, set the element property model-path to specify their location**.**

Simple Webcam Example:

gst-launch-1.0.exe  mfvideosrc ! videoconvert ! d3d11upload ! flubkgndremoval use-chroma=true ! autovideosink

Figure 5: Simple Webcam to chroma example

Conclusions

In conclusion, our background removal plugin offers a powerful and versatile tool for real-time video editing. By seamlessly extracting and isolating a person’s upper body, this plugin enhances video content’s visual quality and customization. As an innovative GStreamer solution, it provides users with a competitive edge in creating polished and professional-looking videos, making it an indispensable asset for those looking to elevate their multimedia projects.

Are you ready to use this plugin in your business? Whether you’re looking to reduce operational costs, improve user experience, or simply curious about the potential impact on your specific use case, we encourage you to contact us. It’s the best way to see how Fluendo could benefit your organization.