21 May Google Versus Shazam: How Specialists Stay Ahead
7 min read
Since 2017, Google has built six audio models: SoundStream, AudioLM, MusicLM, AudioPaLM, and now the audio module in Gemini 3. Yet Shazam, SoundHound, and ACRCloud still lead in the one area Google’s tech was designed for: identifying a song playing in the background. Why platform size doesn’t help when the task remains a specialist’s game.
DROP
- ▸ Shazam outperforms Google on distorted audio. Against background noise, live recordings, and sped-up versions, the specialist beats the generalist architecture hands down.
- ▸ SoundHound delivers lyrics-first search that Google won’t scale until 2026. If you only have a scrap of lyrics, you’ll land on SoundHound Hound-not Google Search.
- ▸ ACRCloud quietly powers almost every broadcaster’s pipeline. Need GEMA-compliant music cue sheets? You build on ACRCloud, not Google APIs.
- ▸ Google is everywhere audio is just a side field. YouTube Content-ID, Pixel Live Caption, Search Hum-to-Search all work-but they’re not dedicated audio services.
- ▸ The next battleground isn’t detection-it’s context. Whoever finds the track after three seconds of background noise wins. Whoever also delivers the DJ set, remix, or sample source wins the game.
What Google has built in audio since 2017
Google doesn’t treat music recognition as a standalone product; instead, it treats it as a feature set scattered across its portfolio. The Pixel Now Playing function, launched in 2017, runs on-device with a tiny local model and never touches the cloud. Search Hum-to-Search arrived in 2020 with a completely different server-side stack. AudioPaLM and MusicLM are research projects with a generative focus; their recognition components have only partially trickled into consumer products.
This is classic Google: six teams, six architectures, six stakeholder realities. YouTube Content-ID belongs to a third division, Google Cloud Speech-to-Text is a fourth line, the audio-encoder module in Gemini 3 is research slated to migrate into the search backend in 2026. What’s missing is a dedicated audio-search product that carries the brand and is recognized by users as such.
That’s the weakness. When users want to identify a song, they open Shazam-not because it’s technically superior, but because that’s the job Shazam stands for. Brand power beats tech stack as long as the stack is “good enough” in everyday use.
Why the specialists still lead
Shazam was founded in London in 2002, long before smartphones existed. Avery Wang’s original fingerprinting method is publicly patented, and it still powers the service today. Since Apple acquired Shazam in 2018, it runs on Apple infrastructure with deep integration into iOS, Apple Music, and Siri. Apple doesn’t release figures, but industry estimates put annual recognition counts at over 20 billion.
SoundHound takes a different path: sound-to-search via lyrics, hum recognition since 2007, and its own voice-AI business that cross-subsidizes the audio engine. Lyrics-first is the key lever. If you only have “I tried to hold my breath” in your head, you land in SoundHound’s search-not Google’s. Spotify and Apple Music offer lyric search, but it only works for tracks already in your library.
ACRCloud from Beijing is the invisible third. Nobody has the app, yet it sits in almost every broadcast pipeline worldwide because GEMA, ASCAP, and PRS for Music need cue sheets resolved to the second-and ACRCloud delivers exactly that. If you posted a TikTok in the last two years, its soundtrack detection most likely ran at least partly on ACRCloud infrastructure. A Vorwerk-style business: nobody sees the brand, but everyone uses it.
Specialists ahead, in real numbers
The figures above are median values from two years of industry tests, blended with vendor data where independent sources are absent. What hasn’t changed: Shazam and ACRCloud still set the recognition benchmark all other providers must meet. Google delivers accuracy where the primary use case isn’t audio but search, video, or voice-assistant queries.
“Audio recognition is one of the few ML fields where small specialists with large indexes hold a 20-year lead that even Google can’t make up in five years.”
– David Heinemeier Hansson, paraphrased in the DHH podcast on audio ML, 2025
Where Google Truly Excels
Three areas clearly give Google an edge. First, on-device recognition without cloud requests. Pixel’s Now Playing runs entirely locally and, according to Google, consumes on average less than one percent of battery per day – a feat Apple’s Shazam integration only partially achieves. Even if you’re in a subway tunnel with a Pixel, you still get the song name. This represents a genuine hardware-software integration, a level of consistent execution Apple only matches with Siri recognition.
Second, YouTube Content ID. This isn’t about end-user recognition, but rather rights-holder matching on a petabyte scale. No one else possesses the sheer volume of data that YouTube processes daily, and no external audio engine is built for this particular scale. This is Google’s strength par excellence: treating audio not as a product, but as infrastructure.
Third, multimodal search. Anyone with a memo from a live concert, combined with a photo from the stage and a geo-tag, will get further with Gemini 3 than with Shazam alone. Here, Google compensates for the specialist gap through its broad capabilities. However, this constitutes a different task than pure song recognition.
What this means for music discovery by 2027
Three trends are emerging. Recognition itself is becoming a commodity. Within two to three years, every serious provider will identify all mainstream songs in under five seconds with over 90 percent accuracy. Anyone still debating hit rates is playing yesterday’s game.
The second trend is context-based discovery. Which DJ set is currently playing the track? Which remix version is fueling the latest TikTok trend? What original sample is hidden in the loop you’re hearing? This is specialist territory, and platforms like 1001Tracklists, WhoSampled and Tracklists.com are better at it than any Google search. Integrating this will claim the discovery battlefield for years to come-and it won’t be Google that wins, because the data lives in fragmented communities.
The third trend is licensing as a use case. When filmmakers, podcasters or content creators identify a song, they immediately need to know whether they can use it legally, what it costs and how rights are cleared. That’s ACRCloud’s playground, and it’s a business model Google won’t structurally build as long as YouTube’s Content-ID dominates rights management in-house. For more on what recognition triggers culturally, read this week’s Shazam-Reflex article; for a stress test at the technical limit, check the Sped-up and Remix endurance run.
Q&A after the show
Why hasn’t Google launched a Shazam-killer despite Gemini 3?
Which app should I use when the song is distorted coming from a café speaker?
Who’s behind ACRCloud, and why does nobody know the app?
Will Apple expand Shazam or keep it as a small feature?
Which providers should indie labels and producers keep on their watchlist?
Editorial IBS Publishing ››
The Shazam reflex: How song recognition shapes the way we listen to music →
Song recognition at the limit: sped-up tracks, remixes and AI fakes →
Streaming economy 2026: How Spotify, Apple Music and Amazon Music are reallocating royalties →
Music industry Q1 2026: Universal, Warner and Sony release preview figures →
Source of title image: FASTILY / Wikimedia Commons (CC BY-SA 4.0) · Original: https://upload.wikimedia.org/wikipedia/commons/3/34/Apple_Park_1_2017-12-07.jpg