Why Doesn’t Anyone Label The Audio?

Do you have a strategy to cope with mixed cadence content?

The great thing about language is its ability to allow us to exchange ideas and concepts, and hopefully create a business by doing so. With the increasing number of multi-platform delivery opportunities, the increasing bandwidths and channel densities, we are also seeing an increasing opportunity for content owners to create revenue with their content. Successfully exploiting that opportunity involves tailoring the version of the content meant for the audience to reduce friction and increase enjoyment of the viewer / listener. The blockbuster movie community has known for a long time that efficiently making versions of a movie and its collection of trailers on a territory by territory basis can make a significant difference to the number of people who watch that movie. I believe that we are entering an era where turbo-charging the versioning efficiency of media companies is going to be a dominant differentiator.

To reduce the costs of versioning and to make life simple for the creative human processes, it is necessary to automate the processes that can be done by machines (or in our case, software). To a company that deals with video, all issues will looks like video issue. The processes for segmenting video content and replacing elements are pretty well understood. Organizations like the UK’s DPP have created standards for interchanging that segmentation information.

In today’s blog, I’m going to assume that the video issues are largely understood and look at a “simple” issue that two customers approached me about here at the SMPTE Australia show.

Right now, on the planet, there are many more languages spoken than there are scripts for writing those languages down. There are also many more scripts than there are countries in the world. This makes the labeling of languages and scripts an interesting challenge for any media company, as the variables are virtually endless. There are many schemes used in the world for labeling audio and any naïve person entering the industry would assume that there must be some sort of global tag that everyone uses for identification … right?

Wrong.

Traditionally, TV stations, broadcasters, content creators and others have created content for a specific market. Broadcasters, distributors, aggregators and others have sent their content to territories with only a handful of languages to cope with. Usually proprietary solutions for “track tagging” are developed and deployed.

The compelling business need to streamline and standardize the labeling of audio channels hasn’t really existed until now. The internationalization of distribution compels us to find an agreed way in which labeling can be done.

Thankfully, someone got there before the media folks. The internet community has been here before – and quite recently. The internet standard RFC5646 is very thorough and copes with the identification of primary languages as well as dialects, extinct languages and imaginary vocabularies such as Klingon. With such a comprehensive and interoperable specification that is widely used for the delivery of web content to billions of devices every day, you’d think that any media system designer worth his or her salt would have this electronic document in their favorites list for regular look-up.

You’d think …

The MXF community knows a good thing when it sees it, so you’ll find that when it comes to a standardized way to tag tracks in MXF – the SMPTE standard ST 377-4 uses RFC5646 as its vocabulary for labeling. ST 377-4 additionally recognizes that each channel of an audio mix might contain a different language. Each channel might also belong to a group intended as a stereo group, or a surround sound group, or a mono-group of one channel. This hard grouping defines the relationship of channels that should not be split. Going further, ST 377-4 defines groups of groups that are used as metadata to enable easy versioning so that, for example, a French group might consist of a French stereo group, a clean M&E surround mix and a French mono audio description channel.

Reality

ST 377-4 with RFC5646 solves a difficult problem in a simple and elegant way. Up until now, it’s been easier for media companies to do their own thing and invent their own metadata vocabularies with proprietary labeling methods rather than use a standard. Today, to get cost effective interoperability we’re starting to rely on standards more and more so that we don’t have to stand the cost of an infinite number of proprietary connectors to make things work.

As you see more versions of more programs being created, spare a thought for the future costs and revenues of media that needs to be exchanged. A little up-front-standardized metadata builds the launch ramp for a future searchable and accessible library of internationalized content. Standardized audio metadata and subtitle metadata – it may be a tiny-tiny addition to your assets, but over time it helps you find, use and monetize versioned content with no effort at all. Take action now and learn the difference between en-US and en-GB. It’s more than just spelling.

Share this post:

By admin_lc

Recommended Articles

Expert Opinions

04/11/24

Unlocking the Future of Media with AI - Join us to Lead the Change

AI technologies progressed drastically in the last few years. Speech-to-text and face recognition are prime examples of use cases where AI-driven solutions that have existed for many years have now reached an acceptable level of maturity and commercial viability.

Product

04/08/24

Leveraging Premium Media Processing for Business Success

In today's fiercely competitive media business environment, every company is looking for means to stay ahead of the pack. Smart, highly efficient media processing can be a game-changer. Discover how Dalet AmberFin delivers high-quality content that grows your audience.

Workflows

04/04/24

How Cloud-Native MAM is Adopting PAM Features for Seamless Media Workflows

Recently, the lines between Media Asset Management (MAM) and Production Asset Management (PAM) have become increasingly blurred. This convergence reflects the quickly evolving needs of the industry. In recent years a huge leap forward in technological capability has coincided with rising creative demands and shifting media consumption trends. This has all had a significant effect...

Expert Opinions

04/11/24

Dalet Blog