GstWASM: GStreamer for the web

Written by

Jorge Zapata

October 5, 2023

During Innovation Days at Fluendo, we had the opportunity to step aside from our daily work and duties and focus on the future. Focus on how new technologies can improve our day-to-day to be more efficient. Focus on how we interact with the vast amount of uncategorized information through new paradigms like AI-based chatbots. Focus on developing new ideas that can be the seed of future new products. But above all, to have an excellent and productive time with all the - local and remote - coworkers.

One of the technological ideas that we chose to work on during Innovation Days was an exploratory integration between WASM (Emscripten) and GStreamer. In the following paragraphs, you’ll find a description of some technical challenges we had during the four days of marathoner and joyful hacking.

Before jumping into the development, let’s describe how this idea began and provide some context on what is already there.

As you probably already know, video codecs are part of browser engines nowadays. Different ways of providing video on a website have existed for several years already and continue to evolve. From the old times of Flash content to WebCodecs, passing through to browser plugins, HTML5

Besides the multiple ways of providing video content on a website, the available codecs are still intrinsic to the browser. They are not extendable by third-party software and are backed up by the specifications in [1] and [2]. It means that either the browser provides the codecs you want or you build it yourself. But with the new WebCodecs API and WASM, a possibility exists to overcome the situation.

Regarding WASM, it stands for Webassembly, a portable binary code that the standard is maintained by the W3C, which is supported by major browsers. WASM software does not directly access the standard Web interfaces (DOM, Canvas, Web Storage, WebRTC, etc.) but through a JS bridge. LLVM has support for it as a target backend, and there are several alternatives to have a full toolchain that supports accessing the browser’s JS APIs from C code. All of them are built on top of LLVM, and the most known and featured is Emscripten.

What if we could replace the WebCodec API with a GStreamer WASM compilation, put Fluendo codecs there, and avoid touching the browser? Possibly, yes, but why? This might be a more personal goal but might apply to others as well. What happens with old family videos you have out there that you can no longer watch because, well, MPEG-4 Visual is no longer supported on a browser, same for MPEG-2, WMV, etc.? Transcoding? Yes, it’s doable, but it would be less fun!

So, let’s see what is already there first.

In order to have a minimal working port of GStreamer, you need GLib and libffi ported to the target platform: WASM. There have been several discussions about WASM support for both on [1], [2], [3], [4]. In the end, we found excellent work done by @kleisauke for the libvips project, which can be found at wasm-vips. On that project, we found a working libffi and GLib ports to WASM, fully functional for their use case, but we will discover that several things are still pending to port to have a fully functional GStreamer in the browser.

Hands-on

Let’s start building GStreamer and its dependencies.

Build system, dynamic linking, and dlopen()

Having dynamic modules and dynamic linking is possible with Emscripten. Still, due to the lack of time, we preferred a simple static linking to avoid setting up a File storage mechanism to load the plugins.so files from the.js/wasm server. Building statically GStreamer is currently not possible with the GStreamer-full logic as it assumes, when building statically, that a new dynamic library with everything - dependant libraries and plugins - is linked inside. We wanted something else.

Function pointer issues

Check [1] for a more in-depth description of the problem. The summary is that the function pointers must be called with the correct type, or an abort(10) will happen. We modified the minimum code to have GStreamer initialized, but if we compile with -Wbad-function-cast -Wcast-function-type, you’ll be surprised by the amount of code implicitly casting to a wrong function type.

GstBus, GstPollFD

The first problem we found once everything was compiled and running was that a pipeline could not be run because the GstBus, for sending messages asynchronously, requires a GstPollFD, which requires a pipe() implementation on the underlying system to block and unblock the receiver of the messages. Given the multi-threaded nature of GStreamer, we didn’t understand why a pipe(), usually used as an IPC, was used instead of a thread-based synchronization mechanism. Maybe this was a legacy from when GStreamer was not wholly multi-threaded.

So yes, we took the axe and simply ignored the behavior of not creating the GstBus if no GstPollFD was available and lived with that. Ultimately, we still needed to have a main loop and popping messages.

Threading

Emscripten does support pthreads. You can find more information at [1]. Due to the large amount of threads GStreamer can create in a pipeline, Emscripten needs to know beforehand how big the pool of “web workers” would be needed. Otherwise, it will fail if more threads are required . We used the linking flag of -sPTHREAD_POOL_SIZE=8 to have something running. Still, this might be a problem for a complete independent GStreamer-WASM build, as the actual thread number needed in an application is application-dependent.

Missing video sink

So, at this moment, we have a static build of GStreamer, plus the core plugins. The browser can load our stack and run a dummy pipeline … and … we have the GStreamer logs in a browser console!

And in a node.js shell, too:

Let’s try to show something on the screen. Emscripten greatly supports the SDL library; sadly, gstsdlvideosink was discontinued and not moved into the GStreamer mono repo. We tried to create a new plugin and, in parallel, investigate other alternatives. Sadly, we couldn’t accomplish having a new gstsdlvideosink on time. Let’s see how it went with the other options.

OpenGL

Having an entire zero-copy OpenGL pipeline would be great. Emscripten already supports EGL [1] and OpenGL [2]. In the OpenGL case, there is support for WebGL [3] and GLES (2.0 and 3.0) [4].

Our first attempt was to reuse the current EGL and GLES GStreamer implementation, assuming some minor adaptions would be needed. We were unaware of where we were heading; we opened a can of worms.

First, the actual API to use. Emscripten has plenty of options to have OpenGL-related applications ported to it and assumes several things on how the actual application to port should work. Compiling (which was easy) and running wasn’t enough.

If using the EGL/GLES combo in GStreamer with Emscripten, we faced that:

The actual WebGL context creation behind the scenes of the EGL context can not be customized. None of the options found on [1] can’t be set.
EGL-related calls always work with a single static window, which is already provided to the application and can not be chosen dynamically. It can only be set statically during compilation or overriding a variable in the JS initialization code provided by Emscripten (OFFSCREENCANVASES_TO_PTHREAD).
The redrawing is done automatically because explicitSwapControl is always set to false.
The actual threading model is, by default, EMSCRIPTEN_WEBGL_CONTEXT_PROXY_DISALLOW, meaning that all calls to OpenGL will happen on the calling thread and won’t be proxied.

For a simple demo application, we thought the above would suffice and should be enough to display something in a browser window, but we had no luck. Some threads were just blocked, and nothing happened. At the same time, we were aware that supporting the GstVideoOverlay with the above conclusions would be just a no-go.

Glib Main loop

Emscripten main loop [1] has some particularities that are worth mentioning. Each of the pages in the browser runs in a cooperative multi-tasking model, which means that each page has its own turn to process any event that happens on the page.

From the point of view of a classic GLib application, it is the application itself that controls the events by iterating over an infinite loop. In each iteration of the loop, the application performs event handling or any other required processing. It goes back into an idle state, waiting for something to happen that will wake up the next iteration.

This infinite loop is a problem in the browser environment because once the control is at the browser level, it is impossible to idle or restore it from the application itself. This will be treated as if the page is stuck, and the browser offers the possibility to halt or close the page.

Emscripten has alternatives to overcome this. We implemented their recommended approach to make Emscripten’s event handling call g_main_context_iterate once every browser repaints using the requestAnimationFrame window’s method. This way, we can emulate what the application does when entering into a main loop with g_main_loop_run(); the counterpart is that any code to be run once the main loop is over will not be run unless the code is handled after an event. Web workers do not have this multi-tasking problem, and the whole application could be run in another thread using the linker flag -sPROXY_TO_PTHREAD. Still, we haven’t found an iteration mechanism that can emulate a main loop in that scenario yet. More on this can be found at [2].

Digging into the threading model of Emscripten’s OpenGL alternatives, we found several problems with Glib. First, GWakeup was not available. As before, no pipe(), no fds, or select() or poll() works. We just need to convert any g_wakeup call into a no-op silently. Invoking calls into another thread wasn’t working correctly, either.

OpenGL second try

So, now we have a better view of the threading model found in Emscripten and the limitations found with the initial EGL/GLES APIs chosen.

GStreamer already has a great model of asynchronous operations in OpenGL-related elements to make all calls to the actual API call happen in the same thread. In the same direction, Emscripten supports different compilation and run-time features to decide how to handle the application threading model and how the rendering will happen.

We created a new “backend” for the OpenGL platform that will use WebGL context creation directly. Some of the options that sounded useful for our case are the following:

explicitSwapControl
renderViaOffscreenBackBuffer
proxyContextToMainThread

For our particular case, we decided not to proxy any OpenGL call to any other Emscripten thread and let the call happen in the actual thread calling that method; in the end, GStreamer is already taken care of by calling OpenGL from the same thread.

We also want to control when to do the actual repaint by swapping buffers ourselves. The problem with this approach is that it requires that the thread controlling the WebGL context has access to the canvas to draw.

Some considerations first: In the browser, the only thread with access to elements and events is the “main thread,” the thread the page runs on. A Web worker thread can not access any DOM element or event; this is by design. At the same time, the way to communicate between the “main thread” and a Web worker is through a message passing, no shared memory. A web worker never has access to the main thread memory.

With WebGL, to draw something, you need to access a Canvas DOM element and use it to set up the WebGL context and finally draw. Using the main thread for rendering is not a good idea, so an Offscreen Canvas [1], can be transferred [2] to a Web worker so another thread can safely do all the OpenGL operations it wants.

This is a new problem to face. Emscripten owns the Web worker thread message handling, and there is no message to transfer a canvas after the thread is created. The actual canvas transfer happens internally in Emscripten if a p_thread_attr is set at the pthread creation. But wait, GLib owns the thread creation, which happens automatically for me, so GStreamer’s context thread creation has to set a p_thread_attr to transfer the corresponding canvas. So we added new GLib APIs to create pthreads that will own a specific offscreen canvas.

Regarding window events. Emscripten uses HTML5 events that can emulate a classic windowing system. Again, all events happen in the main thread and need to be proxied to the corresponding event handling thread on the OpenGL backend.

Sadly, we could not draw something on the screen due to the lack of time and despite all the changes at different levels of the GStreamer stack. This remained a pending topic to accomplish.

GstCheck and testing infrastructure

One of the mandatory topics we had on this implementation was to be able to have trustable and reproducible results and use the same compliance mechanism GStreamer has to validate itself.

While trying to run the GstCheck tests, we found that the fork-less implementation of libcheck does not compile accordingly, and we needed to adapt it back to make it work. (Emscripten does not support fork()). At the same time, tests depending on actual elements rely on a dynamic plugin to be loadable despite the compilation mode you choose. In our case, a static build was not a proper configuration to run tests.

In the end, we succeeded in having some of the tests actually run with the following results:

Ok: 44
Expected Fail: 0
Fail: 150
Unexpected Pass: 0
Skipped: 0
Timeout: 16

That is a good start!

Results

We have accomplished the objective of understanding the ecosystem of WASM / Emscripten and foresee what would be needed to port GStreamer and our codecs to it. We have created the following repositories and branches where you can track down the development and efforts done on this project:

Last but certainly not least, we had the privilege of presenting this project to the GStreamer community at the GStreamer Conference 2023. During this presentation, we provided an overview of the challenges of bringing GStreamer to the web using Emscripten and WebAssembly (WASM). We invite you to explore the details of our talk.

Contribution plan and future

We are evaluating the amount of work needed to be done and the interest this might be for the community and the industry. That is an opportunity to bring GStreamer and Fluendo to other domains, and web-based multimedia processing is a niche worth exploring. Some possible use cases are having non-standard or patented codecs available in a browser and having a whole set of tools that GStreamer brings embedded in a browser. We need more feedback from the different actors to see what this can become. We are eager to receive it by filling out this form!