Next Generation Audio in MPEG-DASH - Personalized experience with Dolby AC-4, DTS-UHD and MPEG-H 3D Audio

Written by

Maciej Sabiniok

November 21, 2024

During Fluendo’s Innovation Days, we had the opportunity to think outside our core duties, innovate, and work on things we are most passionate about. This time, we explored the potential of Next-Generation Audio (NGA) and how it can help provide exceptional audio experiences to consumers, content creators, and service providers. These benefits can be achieved using the most recent audio codecs, including Dolby AC-4, DTS-UHD, and MPEG-H 3D Audio.

Personalization with NGA

As we said, NGA is all about delivering audio experiences that are more accessible, personalized, and interactive no matter how they are consumed. That’s why our work during this special event focussed on enabling the personalization and accessibility potential of Next Generation Audio delivered over MPEG-DASH in GStreamer. (Read more about Fluendo’s MPEG-DASH-related engagements here and here).

The concept of NGA personalization in MPEG-DASH is realized through Preselections included in the MPEG-DASH Media Presentation Description (MPD)file. In this context, a preselection combines one or more audio components with associated metadata to produce a complete audio experience, commonly known as an Audio Presentation.

Preselections within the MPD files can be defined as descriptors or additional Selection elements inside a top-level Period element. We opted for the latter, which is more commonly used for NGA codecs.

Our work on adding preselections to the existing dashdemux2 GStreamer plugin was divided into 3 parts:

Extending the MPD parser: MPD parser extension to read preselections alongside other elements, ensuring support for personalization options.
Adapting the DASH demuxer: DASH demuxer was modified to receive preselection information and create GstAdaptiveDemuxTrack and GstDashDemux2Streamlist accordingly.
Downstream communication: Send this information downstream of the pipeline for decoders based on sticky events, facilitating decoders integration.

Manifest Parser Extension

Parsing Preselection required us to augment the existing XML node structures of parsers and introduce new ones to store all pertinent information. The current DASH demuxer only partially covers elements described in ISO/IEC 23009-1 [2022]. To address this, we’ve made the following additions:

GstMPDRepresentationBaseNodeobject extended with the following members:
- GList *OutputProtection
- GList *EssentialProperty
- GList *SupplementalProperty
- GList *InbandEventStream
- GList *GroupLabel
- GList *Label
Implemented GstMPDTextualDescriptorNode to handle textual descriptions within the GList *Label member of GstMPDRepresentationBaseNode.
Implemented GstMPDPreselectionNode to store attributes and elements of the Preselection element of Media Presentation Description.
Extended GstMPDPeriodNode by integrating GstMPDPreselectionNode.

The following Class diagram presents existing and added objects and their relations.

We’ve modified the parsing sequence to incorporate proper Preselection parsing, including iteration over contained nodes. The following diagram shows the parsing sequence around the Preselection Node.

This work is already available in Fluendo’s Gstreamer and will be proposed as PR to the community.

Modifications to the DASH Demuxer

Dash demuxer has been modified to take into account the presence of preselection elements. If the Preselection element is present, dash demuxer creates GstAdaptiveDemuxTracks that translate into GstStreamCollection based on Preselections and Adaptation Sets. Now, two cases need to be taken into account:

Adaptation Sets Referenced by Preselections:

When an Adaptation Set is referenced within one or more preselections, GstAdaptiveDemuxTracks are created for these Preselections.
An explicit GstAdaptiveDemuxTrack object for this Adaptation Set is omitted.
All created tracks reference the same Adaptation Set but differ in language or preselection tag attributes.

Adaptation Sets Not Referenced by Preselections:

If a specific Adaptation Set is not referenced by preselections, GstAdaptiveDemuxTrack is one-to-one mapping to this adaptation set.

Understanding Downstream Events for Decoders in GStreamer

The GStreamer adaptivedemux2 documentation details the stream selection process and associated events. In the current implementation, the user’s choice triggers the GST_EVENT_SELECT_STREAMS upstream event, which is sent back to the DASH demuxer, enabling selection, download, and playback of the chosen stream.

The decoder has the capability to intercept this information and decode specific presentations within NGA content. However, there is a risk if the currently active track does not support the configuration specified in the chosen preselection. In such cases, the decoder will revert to the default configuration.

This is because preselection may refer to a different adaptation set, which may be a different codec or other elementary stream. For NGA codecs, preselection usually refers to a particular language or presentation containing some alternative content or accessibility, and if we switch to another elementary stream, these elements might not be present in the stream.

Below, the event/message flow represents switching the stream or preselection:

dashdemux2 posts STREAM_COLLECTION message
The user handles the message and chooses a stream from GstStreamCollection
The user sends the SELECT_STREAMS event to the pipeline
dashdemux2 handles the event, selects the stream and prepares the CUSTOM_DOWNSTREAM_STICKY event with preselection data
CUSTOM_DOWNSTREAM_STICKY event with preselection data is handled downstream by a decoder supporting it

The last two actions have been added to avoid switching to a non-existing configuration. The custom sticky event is an extension of dash demuxer, handling this event must be done by the decoder plugin writer.

Summary

The above solution enables the application to convey the preselection information to the user as a selectable tracklist. As a result, the specific experience described in preselections can be chosen by the user and selected by DASH Demuxer. Later, the decoder may want to receive and parse information included in CUSTOM_DOWNSTREAM_STICKY to decode the desired presentation.

These extensions, on top of the existing code of dashdemux and manifest parser, fill the gap in the implementation of MPEG-DASH specification inside GStreamer and allow creation of a more personalized and accessible experience for the end-user.

You can follow our work in this PR, which is available on GitHub**.**We are still working on it, and more information will come soon, so do not miss it!

At Fluendo, we’re dedicated to creating exceptional multimedia experiences. Contact us here if you need support with any audio-related codec challenges—we’re here to help!