Audio’s Opportunity and Who Will Capture It — MatthewBall.vc

As most of the major media categories — music, video and video games — have existed for decades, we tend to forget that media is technology. Instead, we think of technology as being used to express media, rather than media itself. Spotify, for example, is an internet streaming music service, while iTunes is a download music service, SiriusXM is satellite broadcast music service, and radio is a terrestrial broadcast technology. This focus on delivery ignores the classic definition of media: “ outlets or tools used to store and deliver information or data.”

While the above might seem preoccupied with theory and philosophy, all analysis of the past and future of a given media category must start from the fact that media is technology. This is because technology not only enables content categories, it defines their business models and shapes the content, too. And as we know, technology is in a constant process of change.

Chapter 1: How Technology Created Recorded Media, Then Continually Redefined It

Music offers a great view into the interplay between technology, business model and content. Consider the following triptych, which covers seven decades, two decades and one year, respectively.

When the flat record first emerged in the 1850s, it standardized around the 78. The 78 (as in 78 rotations per minute) came in a 10-inch version that held three minutes of music and a 12-inch version that held four. This meant that after centuries of variability, music suddenly had a defined run-time.

This length was reaffirmed by the first mass market standard for consumer media: the 45 RPM vinyl single, which launched in 1948 and held roughly three minutes. The music industry coalesced around this format (and its runtime) for a variety of tech-based reasons. The 45 was far cheaper for consumers than a 78 album, which was important given the high cost of record players and the ubiquity of free (singles-focused) radio. The 45’s cost advantage also meant it was the primary way labels delivered singles to thousands of radio stations across the country for local airplay. In addition, RCA quickly figured out how to make a stackable version of 45s, which was important to jukebox manufacturers. The rise of the 45 naturally led the length of the average song to decline; a four-minute song simply couldn’t fit on the most important audio format in the world.

As the physical and financial limitations of the 78 and 45 were relieved, and the far more flexible cassette and CD emerged, the length of the average single grew rapidly, adding nearly two minutes (or 78%) from 1959 to 1992. Still, almost all tracks conformed to the three-to-four minute standard. After decades, the West had become used to the idea that a song was roughly between three minutes and 20 seconds and four minutes and 10 seconds long.

On its surface, the shift to digital audio should have led to further increases in song length. After all, there was no longer any limitation to run-time. However, the reverse occurred. Technology might have relaxed its grip on music’s length, but it had strengthened its hold on business models.

As is well known, iTunes unbundled the physical album in individually downloadable (and bought) tracks. But in doing so, it penalized artists for bundling a multi-part song into a single track. Pink Floyd’s decision to split the 26-minute and nine-part Shine on You Crazy Diamond into two discrete tracks didn’t matter in 1975; all nine parts fit on a single record and no one wanted to buy just a single half let alone a single part. But in 2005, such a move could mean missing out on 75% of revenues — why sell two things when you could sell nine? And why would a consumer buy an entire $10 album if all they wanted was two $1 portions of Shine on You Crazy Diamond? These incentives naturally led to artists that were publishing new music to split their longer/multi-section songs into separate — and shorter — preludes, interludes and segments.

This behaviour has been greatly exacerbated by the advent of a new and even more disruptive digital music technology: on-demand streaming. While iTunes was technically innovative, its business model was not. Consumers, after all, primarily owned copies of individual tracks in the 1950s and 1960s. Spotify and Apple Music, meanwhile, meant consumers adopted not just a new music technology, but also bought an entirely different product: ongoing access to all music ever created.

But as technology has shifted consumers away from discrete and attributable transactions (buying record A on date B) to ongoing and general ones (subscribing to service C in perpetuity), musical talent needed a new compensation model. Spotify, therefore, decided to pay talent as and to the degree consumers listened to their works. Matching revenue with usage is intuitive, but it was never before possible in music. There was no way to track at-home record spins or CD plays, let alone charge for them. Nor was it practical for iTunes to ask users to download an individual song to their devices and pay several pennies per play when they later synched their iPod to iTunes. (This would have been rife with abuse, too.)

Engagement-based monetization is arguably more fair. Consider, for example, that the Beatles’ Yesterday and Psy’s Gangnam Style would each generate $1 when sold on iTunes, even if the former was played 2,000 times over ten years and the latter 30 times in the month it was bought and then never again. But the more that business models change, the more that incentives and content change, too.

To support engagement-based monetization, Spotify and its label suppliers had to define engagement. And they chose to do this on a per stream basis with a minimum stream time of 30 seconds (to avoid accidental plays, track skipping, etc.). However, this meant that a 10-minute track, five-minute track and 31-second track generated the same royalties.

So as the music industry has transitioned the majority of its revenues from CDs and downloads to streaming, major artists have relentlessly shortened and split their tracks. Why release a five-minute song if you can make it a two and a half-minute song that’s played twice? Or two different two and a half-minute songs? This meant artists had yet another reason to reduce track lengths

All of this helps to explain the extraordinary success of the 2019’s top track, Old Town Road by Lil Nas X, which is also Billboard’s longest running #1 ever, at 19 consecutive weeks. While the song is awesome, it’s also only one minute and 53 seconds — roughly half of 2019’s average song length. This means that four minutes of listening generated two times the average revenue and charting lift of every other hit song that year.

Old Town Road isn’t an exception, either. Up until 2017, Billboard’s Hot 100 Chart has never had a year with more than 2% of its charting tracks shorter than two minutes and 30 seconds (most years had none). In the past three years, this sum has skyrocketed to over 12%, or roughly one in every eight tracks.

Notably, labels are also encouraging artists to simplify the name of their songs and albums in order to ensure they’re optimized for voice-controlled speakers and touchscreen-based searches. A track with five words is more likely to be misunderstood or suffer from autocorrect than one with two. Similarly, voice assistants are known to struggle with accents, such as Irish or even Texan. Being hard to say means you might not get played.

Old Town Road isn’t the first time technology made a hit. In fact, the modern day dominance of rap and R&B comes from how changes in technology – not for delivery, but sales recognition – afforded Lil Nas X the opportunity to top the charts in the first place.

Prior to the 1990s, Black artists and music fans had spent decades arguing the record industry conspired against “urban contemporary” music by refusing it radio play and ignoring its sales. It took only five weeks after Billboard adopted SoundScan, a computerized sales database, to prove this theory right.

Until 1991, Billboard charts weren’t based on actual unit sales or radio play. Instead, it was assembled using (white) retail clerk estimates of what was selling best and what (white) DJs considered to be “hottest” each week. According to The Atlantic, both groups had reasons to lie. For example, labels would pressure radio stations to favour “hand-picked hits” if they wanted to keep receiving the newest single on time (stations sometimes received bribes to play specific tracks, too). Meanwhile, labels would force inventory on their retailers, who would then overreport sales to convince music fans to buy excess inventory.

Naturally, those who ran the music industry saw little need to overhaul how it worked. And thus while the book and film industries had shifted to computerized sales databases in the 1980s, not one of the top six record distributors signed onto SoundScan before its release in June 1991. But this resistance didn’t stop N.W.A.’s N***az4life from debuting #2 on the Billboard Top 100 the very next month under SoundScan. This was the highest charting performance in rap history – and happened without any radio airplay, music video airings on MTV, or a concert tour. The failings of the old honour system were further demonstrated by the fact that N.W.A. debuted at only #21 on Billboard’s R&B chart, which wasn’t yet on SoundScan. Somehow it was possible that N***az4life was the second biggest album in the country by units purchased, but 21st in its own genre when it came to what was “selling” and “hottest.” One week after it’s release, the album hit #1 on the Billboard chart (displacing R.E.M) as hundreds of thousands flocked to the record store in search of the “surprise” hit.

In the following years, the R&B/hip hop genre achieved three other industry “firsts." It saw the fastest rise from a non-top ten genre to Billboard’s most popular one, has been the most dominant #1 by share, and holds the longest run as #1 (note the chart below ends in 2010, but this reign persists through to date).

Even the Beatles, though doubtlessly destined for success, were elevated by changing technology. Between 1954 and 1962, 5.5 million transistor radios were sold in the United States. In 1963 this install base nearly doubled to 10 million, many of which were received as Christmas gifts. The top use case (or “killer app”) for this newly ubiquitous device? Listening to the Beatles’ I Want to Hold Your Hand, which was coincidentally released for radio play on December 26th. Within a month, the song had become the Beatles’ first Billboard #1, thereby landing the group its February appearance on the Ed Sullivan Show and jump-starting Beatlemania.

Sixty years later, the significance of the transistor radio is easy to forget, or to otherwise lump in with less disruptive 20th century audio inventions such as the 8-track or cassette deck. However, the device “was the technological spark that lit the fuse of teen culture in the 60s,” CBS wrote in 2014. “It enabled both public and private listening behaviours in a combination equaled by neither prior nor subsequent technologies. Public, because you could take it anywhere and share music with your friends in the schoolyard, on the beach, wherever, in an unprecedented fashion. Private, because you could listen through an earplug as you walked down the street, or sat in the back of the class, or lay in your bed at night, under the covers, so your parents wouldn’t know.” It is difficult to think of an artist or group that better deserved the promotional platform that the transistor radio offered the Beatles. However, no amount of marketing spend could have bought the band such an enormous opportunity — only technology.

Chapter 2: Sisyphus’ Soundtrack

Although music offers many examples of how changes in media technology lead to changed business models and content, the category is nevertheless considered a tragic outlier in the media industry because of its economic non-responsiveness to technology.

Consider the chart below, which shows how recorded music technologies have spanned six different mediums (vinyl, 8-track, cassette, disc, offline download, streaming) across a seven-decade period, but only to see industry revenues replaced at best and reduced at worst. This is despite the fact that new genres were created, new sounds and instruments emerged, and monetization was transformed twice over. Note, too, that the core listening experience was also continuously improved: cassettes brought portability to recorded music, CDs brought quality and defined track starts/stops, downloads meant your entire library traveled with you and streaming meant the entire library of music was available at all times. In fact, these advances were so great that audiences spent billions re-buying music they already owned on a new format. But re-buying old music isn’t a growth market.

TV Outgrew the Radio Star

Compare the audio chart to video, which grew with the addition of broadcast TV atop film, cable television atop broadcast, with satellite and fibre-optic video driving further gains, and now digital video. Netflix might be disrupting Hollywood and cord cutting might be at all-time highs, but video revenues have never been greater.

This is because over-the-top video, as with prior video innovations, did more than just discover new genres and talent. It fundamentally transformed video delivery, creation and monetization, as well as the video content itself.

It was obvious that the addition of at-home viewership would expand the video market. However, the first technology used to deliver it, broadcast, had many great advantages. Chief among them was ubiquitous coverage and no marginal costs. Every household in a given area – and every room in the household – received a broadcast transmission from a single tower. In addition, there was no cost to setting up a household nor sending content. This meant that broadcast TV was easy to get and could also be free. These two attributes enabled TV to penetrate the American market at an unprecedented rate. Only two in 10 Americans had seen a television in operation in 1945. By 1950, one in 10 owned one, and by 1962, nine in ten did.

However, broadcast had significant constraints. For example, there was only enough broadcast spectrum for a few channels. There was no ability to customer discriminate – every customer received the same service, or none at all – or charge, either. These limitations meant that rather than sell consumers entertainment, the video industry effectively bought consumer attention with entertainment and then sold this attention to third parties through ads. Thus the focus of competition was not the best content, but reaching the most viewers at any point. There was no space – literally – for niche or specialty programming, nor content that might offend advertisers, the sole source of revenues. Broadcast distribution also meant all content aired live. As a result, all shows ran exactly 30 or 60 minutes, including ads, which informed how long an act or scene might be.

Cable removed a number of broadcast restraints. For the first time in TV history, it was now possible to discriminate between households and charge, too. Neighbour A might get no cable service, Neighbour B a basic tier, and Neighbour C a premium one. This meant that television could add a new business model: consumer fees. Coaxial cable massively expanded the number of concurrent video feeds a household could receive. These two changes allowed new TV networks that focused on serving niche populations rather than the broadest possible audience to emerge, including ESPN, CNN, MTV, BET and HBO. These channels, in turn, helped convince audiences to abandon free TV for pay TV. It also helped increase TV watch time. While the share of U.S. homes with TV didn’t grow from 1961 to 2010, usage jumped from five hours to eight and a half hours per day. Nobody liked their cable bill, but they sure liked what it got them. The more they watched, the more money they paid, and the more content the industry could produce and the better it could be.

Specialty channels also made it easier to target specific audiences, which helped increase TV’s ad revenues. Meanwhile, HBO used cable distribution to skip advertising altogether, which allowed it to distribute feature films without censors or edits, and, later, produce original series with levels of nudity and violence that no advertiser would have allowed. By 2010, HBO had become the single most profitable network in the U.S.

Of course, cable technology also introduced new constraints. The high cost of installing and operating cable meant it was now necessary for consumers to pay for television. Since television networks didn’t have the skills or capital to lay down cable themselves, they had to be dis-intermediated by dedicated cable companies that bought the right to distribute these networks and charge the consumer for access.

The difficulty and cost of laying cable had other impacts. Most households didn’t want their lawns dug up for two different cable companies, nor two different cable boxes set to different TV inputs. Accordingly, every home received video from only one provider. In addition, infrastructure costs were so high that many markets had only one provider in the first place – it rarely made sense for a second entrant to duplicate a first entrant’s footprint. This meant TV service was uncompetitive and prices were high — though it also meant consumers didn’t face exclusives. If we could have picked between Comcast or Time Warner Cable or Charter or BrightHouse or Verizon, you can bet the channels would have been different. “Showtime, now a Verizon exclusive network.”

Early cable technology also meant that households couldn’t access select channels or technically receive them on a selective basis (hence packages being “first 50”, then “all”). This eventually led to an overstuffed bundle that forced channels on households. This also meant that all the major media companies were sold together and shared customers – they competed for time, but not access nor individual customers.

The advent of digital video opened up the technology/business model/content loop yet again. Today, a network can reach consumers without dedicated infrastructure (i.e. the multi-purpose internet versus TV-specific coaxial cable), and so they sell directly and individually. This means exclusives are frequent, content fragmentation is high, and some networks (e.g. Netflix) have many times the customers as another (e.g. Epix).

OTT video has also transformed content. As content has shifted online, serialized long-form storytelling has become dominant. Rich, plot driven series like Breaking Bad or Game of Thrones aren’t viable if the audience can’t catch most (if not all) original airings exactly when they start. Without the ability to catch up on a show mid-run, these series would never be able to attract new viewers and would decline in reach with every episode forever. The removal of commercials, meanwhile, meant that series no longer needed to be structured around ad breaks that occur every four to six minutes, or plotted so that scene would be exciting and/or funny enough to get a viewer to come back after a commercial. Similarly, a show could be any length.

Not all changes have been good. Some have argued the shift to serialized storytelling has also led to overlong, bloated series. The fact that every comedy had only 22 minutes meant good jokes were left on the cutting room floor, but so too were bad jokes and narrative fat. And just as it was hard to keep a viewer watching when every episode was filled with ads and aired 167 hours apart, it has arguably become too easy to keep a viewer watching when all they need to do is sit on their couch and wait for ad-free autoplays. Netflix Co-CEO Ted Sarandos has effectively admitted that streaming allows less compelling shows to survive, saying he was a big fan of Succession on HBO, but “If I liked the show a little bit less I’d probably burn out on it. Because I get aggravated every week waiting for the next episode." (Craig Mazin is the writer-director-showrunner of HBO’s Chernobyl, and the forthcoming HBO series The Last of Us).

Similarly, Netflix has said its primary driver of retention and pricing power is total watch time. This means there are network-based incentives to squeeze a little more out of every series. Of course, all networks have hoped to keep their shows running as long as possible. However, having a string of dull-to-fine episodes meant viewers wouldn’t come back a week later. What’s more, the studios behind streaming TV series are paid a markup on their total production costs – so they, too, have incentives to elongate a season.

Finally, the key important innovation in streaming video wasn’t how it affected pay TV models or content, but how it created altogether new formats. The most popular video service in the United States is now YouTube, which is almost entirely user-generated content. Twitch delivers more hours of entertainment daily than all but 25 traditional pay TV networks. US TikTok revenue remains unknown, but obviously large. Collectively, UGC video services generated over $10 billion in 2019, representing roughly 25% of growth since 2010. The economic models here are incredibly unique versus broadcast, cable or digital television.

Video Game On

Video gaming is another incredible contrast with audio. Here, we see consistently additive growth. New technologies operate like geological strata, building upon one another, never needing to cannibalize or replace.

When arcades first emerged, they cost at least $2,000 each (roughly $6,000 in 2019 dollars) and played a single game. As a result, the only buyers of an arcade were businesses. This isolated the video gamer audiences to only those who so strongly wanted to game that they’d leave their house, travel to a shop, and wait in line to play. The need to share devices meant games had to be short and simple, and didn’t store progression data. It also meant monetization had to focus on pay-per-use (again, technology informs content and business models).

The introduction of consumer-grade gaming hardware (i.e. consoles) in the 1980s represented a ground-breaking change: suddenly you could game at home, play multiple titles, and, most importantly, save your progress. Saving meant games could have richer, longer, story-based narratives, and users could play endlessly without an additional fee. This expanded how many people could regularly play games, how often they could play, the affordability of gaming and the diversity of gaming content.

Online gaming added even more. Now gamers could socialize in a game remotely, rather than crammed together on a couch and split-screen TV. The stories told by a game could be both endless and persistent - EVE Online and World of Warcraft never stop, even when you log off, and are nearing their third decade of operation. Mobile gaming brought portability and touch, not to mention AR-based experiences like Pokémon Go, which turns the world into a game.

Game monetization has evolved and diversified enormously over the past 20 years – from package sales to downloadable content, monthly subscriptions, season passes, microtransactions to buy extra lives, outfits or dances. More shockingly, the majority of the most valuable games in the world don’t require players to spend a dollar. In fact, they’re not even games per se.

Roblox, for example, had some 164 million players in July and crossed more than three billion hours in playtime. However, the Roblox Corporation doesn’t make or publish any games directly. Instead, Roblox is a “no code gaming platform” that enables its players, most of whom are children, to easily create, share and monetize games themselves. Roblox achieves this by focusing on design through icons, rather programming language. In this regard, it is similar to the shift from Microsoft’s MS DOS to Microsoft Windows in the 1990s, or BlackBerrys to iPhones in the 2000s, both of which helped turn the personal computer into a device anyone could use.

The results of Roblox’s innovative approach to game creation have been profound: More than 50 million games have been made on Roblox Studio, of which 5,000 have had more than one million plays, and more than 20 have had more than one billion plays. The Roblox top game, Adopt Me, had more than 1.6 million concurrent players in April. In total, Roblox counts more than two million developers, of which 345,000 generate income. Between March and August of 2020, 20-year old Anne Shoemaker made more than $500,000. She now employs 14 people. In 2020, Roblox expects developers to net more than $250 million — not all of which even comes from their games directly. Roblox Marketplace allows developers to re-sell any of the assets they make for their games, such as a tree, item, or 3D model. Suffice to say this is new in gaming.

Chapter 3: Zooming Out on Audio

The charts and histories above paint a discouraging picture of audio. In it, audio seems to be capped in ways that other mediums aren’t; trapped in a Sisyphean economic existence where all that changes is the boulder.

This dynamic stems from audio’s technological simplicity compared to other media types. We can see this in how much earlier recorded audio emerged than recorded video or video game arcades, and live audio (radio) versus live TV or online gaming. Or how much easier it was to make and record music than shoot and press a film, or design a video game.

One could even argue audio is simpler than text. Although the printing press emerged centuries before radio, printed text was difficult and costly to distribute, and could not be delivered live. Not only is audio comparatively easier to make (you just speak!), radio broadcast technology means that whether live or pre-recorded, this audio can reach every single American household simultaneously and at no marginal cost.

While audio’s simplicity provided it with a head start on other categories, it has also held back its growth. As a general rule, media categories that are strongly affected by technological changes are advantaged over those that are not. We see this through the ways technology changes have increased the diversity of content, delivery and monetization.

Over the past several decades, music has evolved stylistically and in genre, but music content itself has not been dramatically overhauled, expanded or reimagined as video games or TV series have been. More audio is produced and distributed today than ever before, but this growth lags that of other media categories (including text). And while audio is easier to access today, it does not reach a greater share of Americans today than it did 60 years ago. Recorded audio has added a third monetization model in the 21st century — subscriptions — but it did so in 2001 with XM Radio. And three is still well short of other categories. Since the mid-2000s, it has probably been easier to make, distribute and build an audience around a vlog or blog than a song. Thanks to iOS and Roblox, mini-games are likely easier now, too.

All of which is to say that audio does grow from technological change. But it does so on longer time horizons and a more selective basis than other categories. To this end, it makes little sense that historical summaries of the music, video and video gaming industries all tend to start in the 1970s or 1980s. After all, music preceded video games by more than a century and took longer to evolve, too. In addition, audio analysis should focus less on individual changes in physical media and more on methods of access — a distinction other categories don’t usually need.

While the past 40 years tell a frustrating story in audio, the 100-year history is very different. Throughout the 20th and 21st century, audio has continually discovered new delivery mediums, formats and monetization models. This began with the launch of the radio broadcast in 1927, which blanketed the country in audio, extended with the transistor radio of the 1950s, which made audio truly portable and private, through to satellite, digital stores, and Spotify streams. Today, the audio category is 40 times bigger in real terms than it was exactly a century ago, two times as big as it was 50 years ago, and up 30% since 1994.

And just as audio needs a broader 20th-century framing, it also needs greater 21st-century context. While it was the first major media category to be disrupted by the internet, it remains the least connected to it when considering both time and revenue. Terrestrial broadcast radio still has more than 40% of non-concert audio-related revenues and listening time – a feat maintained since 1930.

This is good news. The reallocation of revenue and time will fund an enormous set of new content creators, production companies, and distributors. And as always, monetization will be affected too. For example, terrestrial broadcast radio pays fixed per-play rates, regardless of the number of listeners, and only a song’s writers are compensated, not the performers. On-demand streaming pays per listen and all talent is compensated. In addition, these services pay on a fixed share of revenue basis, which means talent’s revenues grow linearly with that of distributors.

More importantly, technology is now affecting the audio category faster than ever before. The diversity of its revenue models, content, and delivery has never been greater. This is inspiring and healthy. And there is a lot more to come.

Audiobooks, Podcasts, And Audio-Only Stories

Audiobooks and podcasts are a great place to start. Some of the former is cannibalized from books — which is still good for audio — but some is also net new. U.S. audiobook revenues are estimated to hit $1.5 billion in 2020 (roughly 15% of the money spent on recorded music) and continue growing 15 to 35% per year.

Podcasting is more directly competitive with radio, which remains roughly 30% talk and 70% music. But the considerable investments being made by market leaders such as Spotify and Audible/Amazon Music (which recently greenlit several series, including shows from Will Smith and DJ Khaled) should also grow the market, too.

TV achieved full penetration in the United States (90%) by 1961, at which point the average family watched five hours per day. Over the next 40 years, television went from being free to costing a minimum of $60 per month. And while many households expressed annoyance at the volume of unwanted channels they were forced to buy, the diversity and quality of the content in the cable bundle led to a nearly 75% increase in view time. Investments in audio should have a similar impact, while also allowing Spotify to hike its price (thus lifting industry revenues).

Similarly, we need to consider how the scale of today’s global on-demand music streaming platforms return (and expand) old opportunities in audio. In 1940, the average family listened to more than five hours of radio per day. By 1960, that was down to two – much of which was in the car (home listening was primarily focused on sports). This drop, exacerbated by the need to listen live, made it impractical for any mass media company to tell audio-first stories. Of course, an audio story could be pressed to vinyl, thus removing the limitation of live air times. But releasing weekly or monthly audio series on vinyl was prohibitively expensive, especially compared to print-based ones (i.e. comics, weekly magazines or newspaper fiction), and meant the loss of ad revenues.

Today, it is easier than ever to reach national audiences with audio stories. There are two primary points of upload (Spotify and Apple rather than hundreds of radio stations) and two points of access for the listener (Spotify and Apple rather than myriad audio channels), both of which offer on-demand playback. And in the time since Serial proved the potential of the new audio model five years ago, some 40 million more Americans have adopted on-demand audio streaming services.

To this end, we can point to Spotify’s recent podcasting deal with comic book publisher DC Entertainment. DC’s last audio-native serial ended in 1951, two years after The Lone Ranger premiered on TV and three years after the unofficial start of the TV era (1948’s the Ed Sullivan Show). Of course, one can debate the size of this opportunity at a time in which blockbuster filmmaking and video games are bigger than ever. But what matters is this is a new and net incremental opportunity for audio. Notably, less than 5% of podcast listenership today is narrative fiction.

More important than how new technology expands the economics for old content categories, however, is how it unlocks new ones.

Low-cost, ubiquitous RSS-based distribution of podcasts is doubtlessly responsible for the medium’s growth to date. Without it, it would have been too hard to find Pod Save America or Serial, to share episodes and to build podcasting habits. But RSS is also a limiter. The RSS standard allows for only a single version of a file to be distributed (which cannot be updated) and almost no audience-side data is returned. This means there’s no detailed listener or listening data (where the audience skipped, whether they completed a file, etc.), no potential for dynamic ad insertion or programmatic advertising and no interactivity. This might not seem like a big deal if you’re a podcast fan today, but the evolution of mobile-phone messaging is an interesting case study.

SMS took off during the late 1990s and early 2000s thanks to the standard maintained by the GSMA, which ensured all mobile devices on all networks and in all countries could send, receive, and read SMS. However, it was the later shift to private and partly closed messaging apps that led to incredible, and largely unexpected innovation in messaging: from read receipts to photo-based communications, filters, stories, auto-deleting messages, avatars, GIFs and more. If the introduction of these features had depended on a global consortium of wireless carriers, they would probably not exist today. To this end, the consolidation of podcast listening to a selection of predominantly closed platforms is likely to bring considerable enhancements and revenue growth to the industry.

Just this week, Spotify demonstrated the burgeoning opportunity here. The company’s self-service podcast creation platform, Anchor, now allows podcasters to instantly integrate any of Spotify’s 40 million licensed songs into their shows and without needing to manage licensing, royalties, etc. This means anyone can be a fully-fledged DJ and produce their own radio shows. In addition, audio journalists can feature as much music (and as much of an individual track) as they’d like in their reports or discussions, rather than needing to just talk about a song or limit playback to a few seconds. Spotify’s interactive platform also allows the user to instantly add songs they hear snippets of to their library or pause the podcast to listen to the entire track.

This new feature breaks RSS and requires Spotify-based distribution, both of which have downsides. But it also makes podcasts a far more powerful medium for content creation, listening, and discovery. This required not just integration into a closed platform, but one that already delivered music digitally and on-demand, via subscription, and with a baked-in rights/royalty management system. All of this required technology that didn’t exist 15 years ago.

Despite the technical and administrative complexity of the above, Spotify’s innovation is relatively modest. Music has been in professional radio shows for a century, and consumers have been able to play individual tracks on the internet for a decade. Bringing music to UGC radio shows and integrating web-playback into these shows is incremental. Imagine what happens when we move to something truly new, different and unexpected the way Snapchat stories or ephemeral photos were? Today’s podcasts and radio shows are still conceptually rooted in technological limitations that are decades old and being rapidly unlocked.

The New Concerts

I see a lot of fundamental potential in remote and virtual concerts. In general, live experiences tend to be the most valuable aspects of the media and entertainment industry. Today, almost all of the value in the pay TV bundle comes from (and depends on) live sports and live news. Nearly all of the top video games are based in live, online play, as is the majority of industry growth. And despite enormous increases in TV broadcast rights to live sports, ticket revenue remained the largest source of revenue for the major U.S. leagues throughout the past decade.

Live generates such a premium because of how much it adds to the standard media experience — from FOMO to greater immersion, an elevated sense of stakes and a feeling of community. There is a reason we laugh more in a theatre than at home, and more at home with a partner beside us than alone.

Despite this, live represents only a modest portion of audio revenues. Concert revenues hit $9.8 billion in 2019, which isn’t small, but falls well short of both paid and ad-supported listening, and represents only a quarter of total consumer spend on music and less than a fifth of total audio revenues. There is no material revenue for live book readings or podcast recordings.

This relative underperformance exists not because audiences are disinterested in live audio, but because of how hard it is to scale live audio experiences and revenue.

Consider the differences between live sports and live concerts. In general, the quality of a live event declines as an attendee moves farther back from the main stage. But for concerts, the experience declines exponentially with distance. Concerts don’t have mosh pits 300 feet and a section up, for example, nor are the acoustics well maintained (the majority of concert revenues are generated at venues made for sporting events after all). Most fans would rather watch an NFL game from beside the field than on the upper rim, but the spectator experience declines more gradually than a concert one. In addition, sitting farther back doesn’t mean losing out on the fan community (“waves” and cheers span all sections). In fact, many of the best, die-hard fans are in the nosebleeds because they can’t afford to go to three courtside games per week. Many die-hard Taylor Swift fans, however, can save enough for one VIP ticket every year or two. In addition, sporting events typically support a full 360 degrees on stadium seating, while concerts afford only 150 (no one wants to stare at the back of a screen).

Concerts also fit a relatively narrow range of attendee personalities/behaviors than live sports. The only way a Kanye fan can sit and enjoy the live concert is if they’re rich enough to afford a box. This means many would-be concert goers can support their favourite artists only by downloading their album or streaming them more on Spotify. Sports fans also benefit dozens of opportunities to see their favourite team in person each year. Music fans, meanwhile, have only a few, if any, chances to see their favourite artists.

Then there’s the literal friction to concert operations. Not only does running a concert require considerable effort, most of it is repetitive. Every single city involves a new venue booking; discrete marketing and ticket sales activities; unpacking, setting up and testing, disassembling and repacking equipment, and more. Operating a sporting event two-to-three times a week in the same stadium is far simpler operationally, as are revenues. And not only do these events require less effort, this effort is focused on producing a unique performance. When the Miami Heat plays basketball in three cities in a week, each match is different and consequential. The goal of almost all concert tours is to offer the same show and setlist night to night.

These scaling problems help explain not just the modest value of live in the audio industry, but also the manner in which concert revenues have grown over the past decade. While U.S. concert revenues are up 70% in real terms since 2009 (or $4.35 billion), more than three quarters of this growth comes from non-top 100 tours. After all, Taylor Swift can’t really be in more cities each week, nor travel more efficiently than she already does. This means millions of Taylor’s fans in tier-three American cities, and even tier-two countries around the world, will never get the chance to see her.

The non-top 100 artists are capped, too, despite the segment’s growth. Many artists have thousands of fans (or more) that would love to see a live performance, but they’re too spread out for such a tour to be economical.

For a time, label producers and premium television networks like HBO hoped to build audiences (and potentially pay per view ones) around at-home viewing of top 100 tours. These would solve for many of the key constraints faced by in-person concerts. However, even the biggest fans passed on such broadcasts, which offered worse audio and no experiential improvements over the real thing. This differs from live sports broadcasts: not only are they produced for at-home viewing, rather than in-stadium attendees, but the at-home experience is augmented by announcer narration, on-screen graphics, expert commentary during breaks, etc. And while live sports and news have stakes that drive you to watch from home, concerts do not.

Digital concerts will forever lack some of what makes in-person concerts great. However, it is important to understand how they address each of the bottlenecks outlined above. Most obviously, these events are highly scalable. A single production or set-up, no matter however elaborate, can run once and reach every fan globally, or be re-used to target individual markets (e.g. Belgium), segments (new fans or superfans) or audiences (German-language fans). Despite these many permutations, Taylor Swift could support dozens of shows per year without enormous effort, time or travel. For similar reasons, artists with only 5,000 fans globally can finally monetize their live performances even if not one fan lives within 100 miles of another.

But most important is how the rise of digital/remote concerts will change the concerts themselves. Historically, at-home concert viewership was an afterthought; broadcasts were created using footage repurposed from the in-person event and, in many cases, delivered weeks to months later. Today, however, such streams are almost always live, and, especially during the COVID-19 pandemic, made specifically for at-home viewers. This, plus the personalization and interactivity afforded by streaming versus broadcast, means that at-home concert experiences can, for the first time, offer real advantages over the “real thing.”

For example, at-home viewers might be able to vote live on the next song, collectively operate stage lights or digital instruments, and even join the performer live via picture-in-picture. The introduction of virtual goods also means that at-home concerts will include goods that, like a tour t-shirt, can only be collected by attendees.

In November 2018, 11 million people attended Marshmello’s concert in Fortnite, and millions more watched via YouTube, Twitch, and other social platforms. Epic didn’t charge for the event, but if it had, most would have bucketed this as video game revenue. And certainly, the in-game items Epic sold as part of the event are considered video game revenue.

Yet as I write this nearly two years later, nearly all concerts have become virtual – distributed over Twitter’s video player, Zoom, or Facebook Live. To call those performances a concert but Epic’s a video game event is wrong. Whether an artist is reproduced to look “real” or fantastic is purely an aesthetic choice; pixels are pixels. What’s more, Fortnite’s concerts have evolved from experiences designed to “replicate the real world” (e.g. a stage, a dance floor, a projection screen) to those that show little regard for it. Travis Scott’s April 2020 concert reached 28 million unique in-game attendees, each of which was transported through time and space. And despite the fantastical nature of Astronomical, a 3D immersive experience is more concert-like than a Zoom broadcast. Soon, Fortnite’s concerts are likely to involve live motion capture, too. This is a concert.

And just as remote concerts allow more artists to economically operate live performances, virtual ones will give more artists the creative tools to do so. The Boss is an incredible star performer, but not all artists are Bruce nor is all music as conducive to a bare stage, guitar and mic stand. Deadmau5 needs a high tech light show to run a compelling concert. In that sense, we have to recognize that the multi-decade growth in concerts has as much to do with the rise of 360° label deals and piracy as it does improvements in concert technology. The same will be true as these concerts shift from Madison Square Garden to Sweaty Sands.

The rise of remote/digital/virtual concerts will change everything about the concert industry. Not just how a concert is made, delivered, and monetized, or even which artists perform in the first place and for whom. It will change who produces and operates a concert and who delivers them, too.

Live Nation is an operations business and one that thrives because of the complexity and non-scalability of concerts. This includes managing booking, ticketing, admissions, clean-up, and more across countless venues, as well as the collection of revenues that span myriad locations, times, and sales channels. These skills aren’t particularly relevant for a globally distributed, online-only YouTube, Twitch or Moment House (I’m an investor) concert with a single point of purchase. And certainly, Live Nation’s historical expertise doesn’t easily translate into the creation of immersive virtual concerts based in Unreal or Roblox. Notably, Fortnite-maker Epic Games now operates a live events space for Fortnite’s concerts series — meaning Epic powers, produces, distributes, and collects revenue for these events.

Meanwhile, Spotify and Apple Music have not just the majority of an artists’ fans on their platforms, but also the greatest insight into these fans. No one can do a better job of reaching Beyoncé fans than Spotify — including Beyoncé. And it costs the company nothing to reach them.

Going Virtual

This gets to the broader opportunity for audio going forward: Music is the soundtrack to our lives, and our lives are becoming increasingly virtual. As a result, this soundtrack and how it’s delivered needs to change.

Consider the enormity of Travis’ Scott’s concert. Nearly 30 million people spent nine minutes fully immersed in his music. This included die-hard and casual fans, non-fans and people who didn’t even know he existed. There is no other experience on earth — including the Super Bowl half-time show — that can deliver this degree of reach and attention, COVID-19 or not. The track Scott premiered during the concert (The Scotts, a collaboration with Kid Cudi) debuted at #1 on Billboard a week later. This was Cudi’s first Billboard #1 and the biggest debut of 2020. In addition, several of the tracks Scott performed from his two year-old Astroworld album returned to the Billboard charts.

Truly a global event. We saw a massive reaction ex-US as well. New fanbases popping up in Latin America, Europe and Asia where we hadn’t seen such reaction in the Spotify & Apple Music charts before.
— Wouter Jansen (@WouterRTJ) October 14, 2020

Note, too, that virtual celebrities like Lil Miquela are rapidly growing their audio footprints. Miquela has over 50MM+ song streams, has repeatedly hit Spotify’s top charts, and appeared at the VMAs and Coachella. Riot Games’ virtual K-pop girl group has twice hit #1 on Billboard global streaming charts and the squad’s first music video hit 100MM views on YouTube in its first month and is approaching 400MM today.

In Fortnite, you can now listen to Top 40 hits while driving around in a car or helicopter with your squad. This might seem quaint, but it’s an incredibly powerful discovery opportunity. When we’re having fun with others, we listen to music we might not otherwise, and we fall in love with it for the same reasons. And just as we have Spotify subscriptions in the real world, the rise of the virtual one will lead to altogether new listening subscriptions or add-on fees.

Similarly, Fortnite now allows players to invite their friends into in-game audio chats even if they’re not playing the game. For example, a friend that’s driving or on the bus can “tune into” and talk to a group of friends that are at home playing the game on their consoles and PC. This isn’t gaming per se, but it is game-based – and a new (and newly possible) audio use case, too.

TikTok is obviously an incredible innovation in music discovery and music-based content creation. Most recently, a video made by an independent and largely unknown TikToker led Fleetwood Mac’s Dreams to hit the Billboard charts for the first time since 1977. Earlier this year, millions discovered Phil Collins’ In The Air Tonight through a TikTok drumming challenge.

Younger generations have always discovered the music of a prior generation. However, this was usually done in small groups via parents and close friends, or to mass audiences via professionally-produced (and incredibly high cost) movie soundtracks. And while labels have theoretical business cases for promoting decades-old music for new generations, this is functionally impractical. Not only are efforts focused on new artists, but just imagine a Sony Music executive trying to figure out why a 15 year old today should care for Billy Joel, how to reach them, and how to overcome the stigma of Joel being their dad’s favourite artist. TikTok, which isn’t governed by the music labels but is enabled by their rights, has solved this problem. And it does so using entirely 21st-century technology (e.g. smartphones, social networking and algorithms), and thousands of users creating videos that reach millions of viewers.

The dynamic detailed above isn’t unique to TikTok, either. All of the major social platforms, such as Twitch, Facebook, and Roblox, are now offering creators the ability to use “professional music” as part of their “user generated content” and at no-incremental cost. This will mean brand new, highly scaled revenue streams and discovery models for artists.

And from a macro perspective, it’s notable that while no one has quite “cracked” UGC in audio, the biggest video platform globally (YouTube) is based on it. As is the biggest video game (Roblox). The biggest publisher of news (Facebook) is UGC (we can throw in cabs, ecommerce, hotels, etc., too). Someone will eventually find the right model. And we’re probably not that far off. Almost all new music today, with exception of indie rock, is “all digital” and thus fully separable by instrument, beat, vocals, etc. In many cases, a hit track is made up of numerous samples, beats, and sounds that come from a patchwork of creators. To return to Lil Nas X, his Old Town Road was based on a $30 (and year old) beat that he bought from an anonymous musician (who had himself re-worked a decade old Nine Inch Nails song). Lil Nas X then created Old Town Road song using stolen software and then self-released the title, too. Today, music-making software and workflow tools remain relatively underdeveloped versus gaming and video (e.g. YouTube), but many, such as SoundCloud, Anchor and Splice, are tackling it.

None of the above is intended to be exhaustive. In recent years, we’ve seen the rise of spatial audio, Apple AirPods, digital walkie talkies, audio-based meditation and mindfulness services like Headspace (I’m an investor), and social audio experiences like Discord and Clubhouse. And this is what makes audio such a great category in 2020 – it’s not just growing faster than it has in decades, it’s diversifying and changing faster too. The cause: technology

Matthew Ball (@ballmatthew)

https://www.matthewball.vc/all/audiotech