Open Menu
Introducing our bridge between DirectShow and GStreamer

Introducing our bridge between DirectShow and GStreamer

User Name

Written by

Katia Marti

September 25, 2018

Bridge between DirectShow and GStreamer

Posted on 26/09/2018 by Stéphane Cerveau

Your application uses the Windows-only framework named DirectShow to perform media processing like a Citrix XenDesktop and you need an advanced and legal solution for the use of patented codecs.

This motivated us to create a new product: Fluendo DirectShow Enabler.

The Fluendo DirectShow Enabler is a solution based on a powerful framework named GStreamer that enables the use of Fluendo Codec Pack in a DirectShow environment.

Using DirectShow but decoding with GStreamer? Build a bridge between frameworks

As the unofficial LAVFilter project based on FFmpeg is providing DirectShow filters to decode/encode or (de)mux media streams, Fluendo decided to provide DirectShow filters which will allow you to use the legal and optimized codecs powered by the Fuendo Codec Pack and GStreamer.

Bridge between DirectShow and GStreamer

A brief description of the frameworks …

Using a similar approach to a bitstream processing, the two frameworks look like distant cousins. Indeed, both of them are using a black-box system approach connected to each other, receiving a flow of data to be processed. If semantics are the same, naming and framework environment differs in a slight way.

Examples to illustrate differences between GStreamer and DirectShow:

  • Plugin -> COM Object
  • Pipeline -> Graph
  • Element -> Filter
  • Pad -> Pin
  • Caps -> Media Types/Media Subtypes

Mediatype compatibility layer

In order to provide the power of Fluendo GStreamers’s codecs, a generic DirectShow filter has been designed to support the different bitstream. This filter is able to decode both video bitstreams and audio bitstreams with their various kinds of specific formats (H.264, AAC etc.)

Aside of the internal GStreamer pipeline construction, the main difficulty was to provide a compatibility layer between the input bitstream coming from the DirectShow graph to a GStreamer pipeline: the two frameworks differ on how to connect and guess the filter/element skills.

DirectShow media type as GUID

Providing a DirectShow filter implies defining the type of filter (Transform Filter) but also providing the right media types/subtypes supported by the filter’s pins.

These media (sub)types will be registered in the DirectShow registry entry. The framework will use these informations to instanciate the filter and allow the connection to its up or down stream filter. This media types and subtypes are described by GUID available Microsoft Windows system wide.

Examples of DirectShow Media Type:

  • MEDIATYPE_Video: 73646976-0000-0010-8000-00AA00389B71
  • MEDIATYPE_Audio: 73647561-0000-0010-8000-00AA00389B71
  • MEDIATYPE_Stream: E436EB83-524F-11CE-9F53-0020AF0BA770

Examples of DirectShow Video Media Subtypes:

  • MEDIASUBTYPE_H264: 34363248-0000-0010-8000-00AA00389B71
  • MEDIASUBTYPE_NV12: 3231564E-0000-0010-8000-00AA00389B71
  • MEDIASUBTYPE_MPEG1Payload: E436EB81-524F-11CE-9F53-0020AF0BA770

GStreamer Media types as GstCaps

GStreamer is using a different approach named caps to describe the media types. As a matter of fact, an extensible and unique structure is attached to a pad to tell the input/output mimetype supported by the element.

A first string will describe the mimetype supported by the element’s pad. This mimetype is equivalent to the media type/subtype combination.

This caps structure can be also used to describe other information which will be described later in this page.

Examples of GStreamer caps:

  • “audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003”
  • “video/mpeg, mpegversion=(int)2”

Compatibility layer

In order to provide a generic decoder which allows to support a large amount of media types, a compatibility layer has been implemented to transform unique DirectShow media types to a GStreamer caps.

A MEDIATYPE_Audio with MEDIASUBTYPE_MPEG2_AUDIO becomes a “audio/mpeg, mpegversion=(int)4”

Each framework allows generic media types, but in order to provide the most reliable filter, specific media types support have been preferred. For each Fluendo Codec, a specific GUID has been registered as an actual support of the filter.

Bitstream specific data

In addition to the initial media type and subtype described by the GUID, DirectShow provides specific data structures to describe the future data packets. In various formats, the filter needs to be prepared to the data in order to setup the downstream filter. For example the audio renderer needs to know characteristics such as sampling frequency or number of channels.

DirectShow data structrure

Various data structures allow to describe the media format and get its properties. During the filter pin connection, the upstream element provides a CMediaType structure, which contains as described above a media type and a subtype (ie MEDIATYPE_Audio, MEDIASUBTYPE_AAC_ADTS) but also a media format. This media format field tells with unique GUID, the format type which will allow to cast the data pointer in pbFormat.

Example:

int rate, channels, depth;
            if (mt->formattype == FORMAT_WaveFormatEx) {
              WAVEFORMATEX *wfex = reinterpret_cast < WAVEFORMATEX * >(mt->pbFormat);
              rate = pwf->nSamplesPerSec;
              channels = pwf->nChannels;
              depth = pwf->wBitsPerSample;
            ...

In this example, we can determine audio attributes such as channels, sampling rate or even specific codec data in the case of AAC content for example.

In the case of video content, various format types can be used to describe the media:

  • FORMAT_VideoInfo
  • FORMAT_VIDEOINFO2
  • FORMAT_MPEGVideo
  • FORMAT_MPEG2_VIDEO

Here is an example to fetch the bitmap dimension of the video:

int width, height;
            if (mt->formattype == FORMAT_VideoInfo &&
              mt->cbFormat >= sizeof (VIDEOINFOHEADER)) {
              VIDEOINFOHEADER *pVih =
                reinterpret_cast < VIDEOINFOHEADER * >(mt->pbFormat);
              width = pVih->bmiHeader.biWidth,
              height =  pVih->bmiHeader.biHeight;
            ...

Here is an example to fetch codec data:

if (mt->formattype == FORMAT_WaveFormatEx) {
            WAVEFORMATEX *wfex = reinterpret_cast < WAVEFORMATEX * >(mt->pbFormat);
            int codec_data_len = mt->cbFormat - sizeof (WAVEFORMATEX);
            if (wfex->cbSize > 0 && codec_data_len == wfex->cbSize) {
              guint8 *codec_data =
                  (guint8 *) g_memdup (mt->pbFormat + sizeof (WAVEFORMATEX),
                  codec_data_len);
            }
          }

GStreamer data structrure

On the GStreamer side, all the specific information is handled by the GStreamer caps which, in addition to the mimetype(s), describe the media format to help the elements to setup the data flow management.

Here is some examples of caps:

  • “audio/mpeg, mpegversion=(int)1, mpegaudioversion=(int)1, layer=(int)3, rate=(int)44100, channels=(int)2, parsed=(boolean)true”
  • “audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(int)44100, channels=(int)2, channel-mask=(bitmask)0x0000000000000003”
  • “video/x-h264, stream-format=(string)avc, alignment=(string)au, level=(string)3, profile=(string)high, codec_data=(buffer)0164001effe1000f2764001eac56c0a02da6a0c020c04001000428ee3cb0, width=(int)640, height=(int)352, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)bt601, interlace-mode=(string)progressive, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, parsed=(boolean)true”

As we can see in the above list, the caps own everything to describe the data flow.

Compatibility layer

The main difficulty to bridge DirectShow and GStreamer relies on data formating between DS data structure and GST caps in a generic way. As GStreamer already implements a generic way to describe all kinds of data from video to binary specific format, DirectShow is expecting or providing static formats which required to handle each media subtype and media format as a single case.

The DirectShow data model is not extensible as the GStreamer one but provides a closed environment avoiding confusion.

Learn more about Fluendo’s patent compliant GStreamer codecs on DirectShow here.