Audio Is Content Marketing's Next Frontier

Podcasts, Digital Assistants, Smart Speakers & The Audio Algorithm

Personal computing technology has followed one constant trend throughout the years: From the keyboard to the mouse of the early days to touch surfaces and guestures and voice of the present day, the trend has been one toward effortless input.

I've been podcasting, properly speaking, since 2011 when my friend Pat Lilja and I did The Daily Numbers podcast at Tunheim and as of this writing, I'm going on the 264th episode of the Beyond Social Media Show podcast, which I co-host with BL Ochman.

You could say I've been bullish on audio for years.

I've been so bullish that listeners to the Beyond Social Media Show are likely a little tired of my sounding the drumbeat of audio marketing so frequently. Let me take some time to explain why I think it is the next great frontier.

10 Reasons Why Audio Is The Next Frontier For Content Marketing

1) The Ease Of Audio Consumption

Audio was the easiest form of content to consume during the mass media era. Family huddled around the radio to hear Franklin Delano Roosevelt deliver his fireside chats during the 1930s. My father puttered around the garage listening to Twins games during the 1970s. And I listened to sports radio while driving to Vikings games last season.

The common demoninator across all types of audio is that it frees us to be productive while consuming it.

The dishes were washed while listening to FDR. The Twins triumphed while fixing an engine. Boredom is alleviated while driving from one point to another.

During this era of media fragmentation, it remains true that audio is the easiest long-form content to consume.

But now we can listen privately to our chosen topic and we are seldom bound by time to make an appointment to hear that content.

Audio content is hands-free and independent of physical location.

2) The Rising Popularity Of Podcasts

Chart: Google Trends: Podcast Searches From 2004-2018
'Podcast' Searches at Google from 2004-2018

There was a great deal of interest in podcasts when the form was introduced during the latter part of 2004 and that interest remained relatively stable for roughly three years.

Then interest began to wane until 2014, when, as you can see from the chart above, the Serial podcast captured the imagination of the country and propelled podcasting back into favor.

Edison Research, which has been tracking audio consumption for years with its Share Of Ear reports, revealed in April, 2019 that:

  • Today, 51% of Americans 12+ have ever listened to a podcast, with 32% having listened in the past month, and 22% in the past week.
  • Podcasting's Share of Ear has more than doubled in five years, increasing 122% since 2014.
  • Although all key demographics grew, much of the increase in podcasting has come from Americans age 12-24.
  • 41% of monthly podcast listeners say they are listening to more podcasts today compared to one year ago.
  • 43% of monthly podcast listeners say they have listened to a podcast on Spotify, and 35% on Pandora.
  • 54% of podcast consumers say that they are more likely to consider the brands they hear advertised on podcasts, compared to 7% who say they are less likely.

Types Of Podcasts

The early days of podcasting (and Charley Lock at Wired has posted a nice history) were dominated by technologists, who could figure out how to actually publish podcasts, and politicos who had a burning desire to voice their opinions.

Not surprisingly, early podcasts tended to be about those two topics.

While they are fascinating topics, technology and politics don't appeal to everyone. Gradually, as podcasting became easier, sports and religion and finance podcasts entered the fray.

It took professional radio people, though, skilled at audio storytelling, to produce a blockbuster podcast such as Serial.

The repopularization of the format and the increasing ease of publishing flooded the market with podcasts devoted to any possible topic.

Podcast Formats

Most podcasts adhere to one of the following formats:

  • Scripted Stories - This type of podcasting is the most compelling because they actually tell stories, to which we humans are hardwired to pay attention. They can be either fiction or non-fiction. Scripted podcasts are dominated by the audio professionals. Serial is a scripted story. They take their cue from Ira Glass' This American Life.
  • Interview - This is an extremely common format where the host interviews the guest. It is pretty easy format for anyone to accomplish, except, perhaps, for the booking and scheduling.
  • Solo expert - This format consists of a single host discussing the topic of their expertise. This is the easiest format because all you need is a computer, a microphone and an opinion.
  • Panel discussion - This format features two or more people discussing a given topic or theme.
  • Repurposed content - This is simply taking existing content, such as a conferenece presentation or a live radio program, and distributing it via podcast channels.

The success of Amazon's and the rise of audio books generally is yet another indicator of the popularity of the audio format.

3) Voice Activation Everywhere

Grand View Research valued the global voice and speech recognition market size at $9.12 billion in 2017 and expects it to expand at a compound annual growth rate of 17.2% during the forecast period.

Chart: US Voice-Recognition Market Size

In 2011, Apple introduced voice activation technology to the iPhone 4S and called it Siri. It soon created a buzz, as this episode of The Daily Numbers demonstrates:

While Apple got all the buzz, Google added voice activation to desktop search in 2011 and added voice search to its iOS app the following year.

Google's voice technology quickly surpassed Apple's Siri.

Amazon released the first Echo smart speaker in 2014, starting an arms race for domination of the category, with Google releasing its Google Home, Apple offering the HomePod, with Facebook planning their own voice activated home device.

Now, Amazon claims 28,000 products work with Alexa, up from 4,000 at the start of 2018, according to Bloomberg.

Voice activation is being integrated into all manner of products, even dolls. In 2015, Mattel released a voice-enabled doll called Hello Barbie that kids could converse with as they play.

At the Consumer Electronics Show (CES), Google announced that Google Assistant would be active on roughly one billion devices by the end of January, Gavin O-Malley reports for MediaPost.

And with every passing year, the CES features an increasing number of voice activated devices; this year has been no different.

According to Statista, this year 55% of new cars will come with voice recognition installed. Expect voice activation to be a standard feature of cars in the near future.

I ask for channels and on-demand movies through my Xfinity remote control or tell my Xbox Kinect to launch my Netflix app.

One personal anectdote illustrates how voice activation technology is changing consumer behavior.

I was walking to the bus after work one day, thinking about playing football with the boys over the weekend when I found myself nearly blurting out to the air in general, "Alexa, what's the forecast for Saturday?"

4) Text-To-Voice Technology Grows Up

One of the ways I get the most value out of my Amazon Echo is having Alexa read my Kindle books to me. While it is pretty amazing technology, the experience is not perfect.

Amazon offers a text-to-speech cloud service called Polly and last year, the company released a WordPress plugin for Polly that turns blog posts into audio.

The plugin is likely as much about getting a larger data set with which to train its text-to-voice algorithm as it is about adoption of Polly as a service.

Because if you run some sideb-by-side texts with both Amazon Echo and Google Home, there's no comparison.

Google Home wins.

While I've done such testing on an ad hoc basis, Loup Ventures has done comparison tests with Google Home, Amazon Echo, Apple HomePod (which uses Siri technology), and Microsoft's Cortana.

The company asked 800 questions of each company's technology and Google won by a wide margin:

  • Google answered 88% of the questions correctly,
  • Siri answered 75% correctly,
  • Alexa answered 73% correctly,
  • And Cortana had a 63% success rate.
Chart: Smart Speaker Queries Answered Correctly By Category
Image courtesy Loup Ventures

Google's success at answering queries is due to the fact that it has been refining its natural language processing since 1998 and that it has the entire world wide web at its disposal for answering questions.

Ask both Google Assistant and Amazon Alexa complex questions, and Alexa is far more likely to come up short, due to the lack of information at its disposal.

Neither Apple nor Amazon have been indexing the web and though Microsoft does own the Bing search engine, it is not nearly as accurate as Google as a search engine.

5) Audio Indexing

Google Wants To Deliver Audio In Search Results

Google already does such an excellent job of translating text-to-voice that you can ask Google Home or Google Assistant complex questions and it is most likely to provide an accurate answer.

But now the company is hard at work trying to unlock the content that is currently trapped in audio files.

Steve Pratt of Pacific Content reports that Google want to enable audio in search results so that:

In the future...podcast metadata could allow individual episodes to appear in Google search results. Not only could your podcast show up when people search specifically for your podcast (already available on Android), but your podcast could also show up when people search for topics or people that your podcast covers, as well as sports, movies, tv shows, or virtually anything else.

Steve Pratt, Pacific Content

He quotes Zack Reneau-Wedeen of Google's podcasting team:

In the longer term, integrating with Search means figuring out what each podcast is about and understanding the content of that podcast. This is something Google has done extremely well for text articles, as well as for images and even more structured data such as maps. We can help with audio, too. It has the potential to help people find the best content for them in that moment, better than they can today.

Zack Reneau-Wedeen, Pacific Content

This has the potential to unlock a massive trove of content to an infinitely wider audience via both traditional search results but increasingly via voice activated devices like Google Home.

While Google has spent years understanding the spoken word by processing voice commands via voice search and services such as Google Voice, there are other ways to extract meaning from audio files.

Spotify's Waveform Analysis

One fascinating method of understanding and extracting the contents of audio files is the waveform analysis that Spotify employs for its beloved Discover Weekly feature.

Sound is data and as such, it can be processed, analyzed, and ultimately visualized. Not the way that a song is visualized on a sheet of music that is interpreted by musicians but precisely in unique visual shapes.

This video makes the point by visualizing the timbre of the same note played on different instruments.

If Spotify can use this kind of visualization to identify all aspects of a given song file--from genre to tempo and melody to individual instruments and vocal ranges and styles--and assign that information to your own personal musical tastes profile based on the music you've listened to in the past, the mystery of its recommendation acumen should fade away.

Now, take that same methodology and apply it to spoken word sound and it should be able to make stellar podcast recommendations as well.

Waveform analysis can likely identify not just the topics and whether or not a given podcast uses a talk show, interview or scripted format but also recommend a podcast because you tend to favor hosts with a baritone voice.

Pandora's Podcast Genome Project

Pandora is taking a similar approach as Spotify. Pandora is taking the principals of its Music Genome Project and applying it to podcasts:

Similar to how its namesake the Music Genome Project helped Pandora become the best and easiest way to discover music online since 2005, the Podcast Genome Project will recommend the right podcasts to you at the right time. It evaluates content based on a variety of attributes spanning content categories, as well as your signals including thumbs, skips, collects, and plays. Our system learns your preferences using natural language processing, collaborative filtering, and other machine learning approaches. And, similar to the Music Genome Project, the Podcast Genome Project combines these techniques with the expertise of our in-house Curation team to offer personalized recommendations down to the episode level that reflect who you are today and evolve with you tomorrow.

Introducing Podcasts On Pandora

6) Real-Time Language Translation

Google has been perfecting its language translation technology since it launched Google Translate in 2006. In 2014, it added speech translation with the acquisition of Word Lens.

To date, Google Translate is available in more than 100 languages and serves more than 200 million people daily.

The technology is currently available via the Google Assistant app as well as through Google Home speakers.

Enabling real-time audio translation opens up the reach of content exponentially. It will also play a profound role in conversational commerce.

7) Conversational Commerce

Consumers have become acclimated to using text-based chatbots. They can be found on Facebook pages, within BBC articles, as virtual assistants on web sites, and as the user interface for mobile apps like Quartz' news app:

Amazon users can order products from the Amazon store using their Echo speakers, re-order products they've previously purchased and add items to their shopping cart to buy later.

Third-party shopping apps include a Dominos skill for ordering pizza, a GrubHub app for ordering from every other restaurant, and a Starbucks skill so you can skip the line.

While Google does not have the eCommerce ecosystem Amazon has, that doesn't mean Google's voice technology is sidelined when it comes to commerce.

While not nearly as far along as Amazon when it comes to voice commerce, Google has begun to implement shopping ability into its voice technology, backed by its eCommerce infrastructure, Google Express.

But Google is not banking solely on shopping cart features. Last year, Google made headlines with the announcement that you would be able to use Google Assistant to make restaurant reservations on your behalf by using artificial intelligence to make a call and book a time for you.

As of this month, Google Duplex has rolled out to 43 states.

Both Amazon and Google are encouraging third-party app development on their platforms with tools to create them:

8) Freeing Up Of Consumers' Time

Open Offices & Headphones

While there has been a bit of backlash against open layout plans, they are likely to remain in place for awhile when you consider the costs of remodelling office space.

With no walls to mute the sound of open air speakers, these office layouts have forced the use of headphones while at work. And since no one can hear what you are listening to, there are no inhibitions against listening to podcasts.

This has likely helped fuel the resurgence of the podcasts as well.

Netscape founder and venture capitalist Marc Andreesen thinks audio is the new hotness as well. He points out (via Connie Loizos) that Apple AirPods have freed up the entire workday for people to listen to audio.

The really big one right now is audio. Audio is on the rise just generally and particularly with Apple and the AirPods, which has been an absolute home run [for Apple]. It's one of the most deceptive things because it's just like this little product, and how important could it be? And I think it's tremendously important, because it's basically a voice in your ear any time you want.

For example, there are these new YouTube type celebrities, and everybody's kind of wondering where people are finding the spare time to watch these YouTube videos and listen to these YouTube people in the tens and tens of millions. And the answer is: they're at work. They have this Bluetooth thing in their ear, and they've got a hat, and that's 10 hours on the forklift and that's 10 hours of Joe Rogan. That's a big deal.

Of course, speech as a [user interface] is rapidly on the rise. So I think audio is going to be titanically important.


Hands-Free Transportation

According to Bluetooth SIG, 86% of new cars, trucks and SUVs shipped worldwide in 2018 came equipped with bluetooth technology. WiFi and Bluetooth are quickly becoming standard features of cars.

Additionally, state transportation infrastructure budgets are likely to increasingly emphasize public transportation as we attempt to combat climate change.

With more and better public transit options, commuters are likely to forgoe the hassle of parking in favor of more productive and less stressful routes to and from work.

According to WNYC, the average commute time in the United States is 25.3 minutes. That adds up to potentially an additional 4.25 hours a week of listening time.

Finally, while self-driving vehicles may seem a distant vision on the horizon right now, it does appear they will be a certainty at some point.

Smart Glasses

While Google Glass and Spectacles have been underwhelming augtmented reality smart glasses, the focus of those products have been on augmenting reality visually.

Bose has introduced Frames, audio enabled sunglasses that couple with a smart phone app to send audio from the frame to your ears. Applications include:

  • Listening to music (of course)
  • Receiving phone calls
  • Audio tours and navigation
  • Audio games
  • An audio caddy for golfers mapped to more than 45,000 golf courses
  • A motivating workout coach

The one limiting social factor that could impede the adoption of such smart glasses is the social stigma you earn when it looks like you are talking to yourself rather than your glasses.

AlterEgo appears to have developed technology that solves that problem by allowing people to converse in natural language with machines, artificial intelligence assistants, services, and other people without any voice-without opening their mouth, and without externally observable movements-simply by vocalizing internally.

9) Audio Analytics

To date, analytics for podcasts have been pretty rudimentary, capturing the number of times an individual file has been downloaded, the device used to access it and where the listener was from geographically.

Not very helpful when compared to the sophisticated analytics we can get from Google and others based on website behavior.

Apple recently added a few more bits of useful data such as percentage of listeners who have subscribed, average listening time per episode, and percentage of episode listened.

For my podcast, I supplement the data I get from Apple, Google Play, Stitcher, and Libsyn with the Google Analytics data from the Beyond Social Media Show site and YouTube analytics from the video we upload.

As demand for audio content grows, I expect the native audio analytics for each platform will become more sophisticated as well. The proliferation of audio advertising will force the issue.

National Public Radio has pulled together a coalition of heavy hitters in audio and technology to develop, promote and implement a measurement protocol called Remote Audio Data.

Though there appears to be hope on the horizon, at this point it is probably more helpful to think of what is possible rather than what currently exists.


If I were to bet on a company providing rich and full-featured analytics, it would be Google, given the company's track record on measurement products.

The kind of insight I imagine we'd be able to get from Google would include whether a listener found your audio via desktop or mobile text search or a voice search using Google Assistant or Google Home.

We would also likely get attention data such as duration of listen, listen starting points and exit points, pauses, stops and replays and all of that, ideally, coupled to topic keywords.

Google could also not just map listener sessions to geography but define a route, so publishers could understand where listeners were traveling.


Amazon could conceivably provide much of what I detailed above for Google.

But the most exciting thing about Amazon is the ability to potentially create user profiles based on their purchase behaviors, brand affinities, or television and movie preferences.

Spotify & Pandora

Obviously, both of these platforms understand their users' musical tastes, so coupling musical taste profiles to listener data would be interesting.

I am particuarly fascinated with the potential of waveform analysis in identifying how different aspects of sound perform best in a podcasting or marketing context.


SoundCloud, interestingly, ties any comments made on the audio you've uploaded to the precise time within that audio file during which the listener posted the comment.

That gives you some interesting insight into what prompts people to comment.

Conversation Analytics

Amazon offers some basic analytics for its Echo Skills out of the box and Google offers Chatbase for chatbot analytics.

As demand for audio analytics grows, one would expect that feature would be built into such tools.

10) Monetizing Voice

Beyond individual companies like Dominos and Starbucks monetizing voice activated via custom-made skills, the most obvious monetization tactic will be advertising.

Pandora and Spotify sell audio ads directly and advertisers can buy digital audio ad inventory programmatically on Spotify, TuneIn, SoundCloud and Google Play Music using Google DoubleClick's Bid Manager.

Audio advertising will be tricky, however.

Many advertisers will no doubt apply the same approach to this medium as they have to traditional radio.

Doing so will likely elicit the same response from listeners: They will either tune you out or get angry at you.

Rather than thinking of this format as an interruption medium, with more data and insight into who we are targeting with audio advertising, we have the ability to earn listeners' attention by creating valuable advertising content to what we know about them as an audience.

One example of this is the Tonight Show sponsoring the joke-telling feature of the Amazon Echo by replacing Alexa with Jimmy Fallon:

A really exciting opportunity audio advertising presents is the ability to run sequential or serial audio campaigns that only play themselves out based on listener behavior. Each subseqent audio ad would only be played if a listener has listened to the preceding ad in the sequential campaign.

But listening behavior can be tied to monetized goals outside the confines of audio advertising.

Just as we can tie written content to conversion metrics, so can audio. Audio conversation bots can be tied to business outcomes.

Finally, the platforms themselves obviously see a way to monetize audio. Google through search and audio advertising. Amazon though customer retention. And with Spotify's purchase of Gimlet Media, it appears they are setting themselves up to the the Netflix of podcasts.

Prepare, Content Marketers. Prepare!

So how do we content marketers take advantage of the new frontier?

First and foremost, take a look at existing content with an eye for how it can be either used in an audio context or optimized for text-to-audio.

Second, look at existing processes and how they might contribute to creating audio content.

A standard practice among content marketers, for example, is the subject matter expert interview from which we write a blog post or a bylined article. We usually record these for our own convenience.

That audio is no longer delete-worthy after it's been transcriped. Now it can be sliced and diced to be offered as clips to accompany a written piece or cobbled together as a podcast episode.

Third, start podcasting. Seriously.

Finally, take a deep dive into conversation design via chatbots. They are the leading edge of the conversational commerce revolution.

Are You Doing Audio Marketing?

  • Check all that apply
  • This field is for validation purposes and should be left unchanged.