Aufnahme von Apple Park aus der Luft zeigt das kreisförmige Hauptgebäude mit umliegender Landschaft.

Google Versus Shazam: How Specialists Stay Ahead

7 min read

Since 2017, Google has built six audio models: SoundStream, AudioLM, MusicLM, AudioPaLM, and now the audio module in Gemini 3. Yet Shazam, SoundHound, and ACRCloud still lead in the one area Google’s tech was designed for: identifying a song playing in the background. Why platform size doesn’t help when the task remains a specialist’s game.

DROP

  • Shazam outperforms Google on distorted audio. Against background noise, live recordings, and sped-up versions, the specialist beats the generalist architecture hands down.
  • SoundHound delivers lyrics-first search that Google won’t scale until 2026. If you only have a scrap of lyrics, you’ll land on SoundHound Hound-not Google Search.
  • ACRCloud quietly powers almost every broadcaster’s pipeline. Need GEMA-compliant music cue sheets? You build on ACRCloud, not Google APIs.
  • Google is everywhere audio is just a side field. YouTube Content-ID, Pixel Live Caption, Search Hum-to-Search all work-but they’re not dedicated audio services.
  • The next battleground isn’t detection-it’s context. Whoever finds the track after three seconds of background noise wins. Whoever also delivers the DJ set, remix, or sample source wins the game.

What Google has built in audio since 2017

Google doesn’t treat music recognition as a standalone product; instead, it treats it as a feature set scattered across its portfolio. The Pixel Now Playing function, launched in 2017, runs on-device with a tiny local model and never touches the cloud. Search Hum-to-Search arrived in 2020 with a completely different server-side stack. AudioPaLM and MusicLM are research projects with a generative focus; their recognition components have only partially trickled into consumer products.

This is classic Google: six teams, six architectures, six stakeholder realities. YouTube Content-ID belongs to a third division, Google Cloud Speech-to-Text is a fourth line, the audio-encoder module in Gemini 3 is research slated to migrate into the search backend in 2026. What’s missing is a dedicated audio-search product that carries the brand and is recognized by users as such.

That’s the weakness. When users want to identify a song, they open Shazam-not because it’s technically superior, but because that’s the job Shazam stands for. Brand power beats tech stack as long as the stack is “good enough” in everyday use.

Why the specialists still lead

Shazam was founded in London in 2002, long before smartphones existed. Avery Wang’s original fingerprinting method is publicly patented, and it still powers the service today. Since Apple acquired Shazam in 2018, it runs on Apple infrastructure with deep integration into iOS, Apple Music, and Siri. Apple doesn’t release figures, but industry estimates put annual recognition counts at over 20 billion.

SoundHound takes a different path: sound-to-search via lyrics, hum recognition since 2007, and its own voice-AI business that cross-subsidizes the audio engine. Lyrics-first is the key lever. If you only have “I tried to hold my breath” in your head, you land in SoundHound’s search-not Google’s. Spotify and Apple Music offer lyric search, but it only works for tracks already in your library.

ACRCloud from Beijing is the invisible third. Nobody has the app, yet it sits in almost every broadcast pipeline worldwide because GEMA, ASCAP, and PRS for Music need cue sheets resolved to the second-and ACRCloud delivers exactly that. If you posted a TikTok in the last two years, its soundtrack detection most likely ran at least partly on ACRCloud infrastructure. A Vorwerk-style business: nobody sees the brand, but everyone uses it.

Specialists ahead, in real numbers

98 %
Shazam hit rate on clean audio. Drops to 70–80 % on sped-up or pitched versions, depending on source.
3 sec
Is all Shazam’s backend usually needs for recognition. Google Hum-to-Search requires at least ten seconds of humming.
100+ Mio.
Songs in the Shazam index. ACRCloud matches that range, plus exclusive sub-genres and regional catalogs.
7
Audio models Google has publicly described since 2017. None is a dedicated end-user recognition product.

The figures above are median values from two years of industry tests, blended with vendor data where independent sources are absent. What hasn’t changed: Shazam and ACRCloud still set the recognition benchmark all other providers must meet. Google delivers accuracy where the primary use case isn’t audio but search, video, or voice-assistant queries.

“Audio recognition is one of the few ML fields where small specialists with large indexes hold a 20-year lead that even Google can’t make up in five years.”
– David Heinemeier Hansson, paraphrased in the DHH podcast on audio ML, 2025

Where Google Truly Excels

Three areas clearly give Google an edge. First, on-device recognition without cloud requests. Pixel’s Now Playing runs entirely locally and, according to Google, consumes on average less than one percent of battery per day – a feat Apple’s Shazam integration only partially achieves. Even if you’re in a subway tunnel with a Pixel, you still get the song name. This represents a genuine hardware-software integration, a level of consistent execution Apple only matches with Siri recognition.

Second, YouTube Content ID. This isn’t about end-user recognition, but rather rights-holder matching on a petabyte scale. No one else possesses the sheer volume of data that YouTube processes daily, and no external audio engine is built for this particular scale. This is Google’s strength par excellence: treating audio not as a product, but as infrastructure.

Third, multimodal search. Anyone with a memo from a live concert, combined with a photo from the stage and a geo-tag, will get further with Gemini 3 than with Shazam alone. Here, Google compensates for the specialist gap through its broad capabilities. However, this constitutes a different task than pure song recognition.

What this means for music discovery by 2027

Three trends are emerging. Recognition itself is becoming a commodity. Within two to three years, every serious provider will identify all mainstream songs in under five seconds with over 90 percent accuracy. Anyone still debating hit rates is playing yesterday’s game.

The second trend is context-based discovery. Which DJ set is currently playing the track? Which remix version is fueling the latest TikTok trend? What original sample is hidden in the loop you’re hearing? This is specialist territory, and platforms like 1001Tracklists, WhoSampled and Tracklists.com are better at it than any Google search. Integrating this will claim the discovery battlefield for years to come-and it won’t be Google that wins, because the data lives in fragmented communities.

The third trend is licensing as a use case. When filmmakers, podcasters or content creators identify a song, they immediately need to know whether they can use it legally, what it costs and how rights are cleared. That’s ACRCloud’s playground, and it’s a business model Google won’t structurally build as long as YouTube’s Content-ID dominates rights management in-house. For more on what recognition triggers culturally, read this week’s Shazam-Reflex article; for a stress test at the technical limit, check the Sped-up and Remix endurance run.

PLAYLIST

Q&A after the show

Why hasn’t Google launched a Shazam-killer despite Gemini 3?
Because brand reality clashes with tech reality. When users want to identify a song, they open the app they already trust-Shazam, SoundHound, or their streaming service. Google would have to either launch a dedicated music-recognition app or push the feature so prominently in Google Assistant or Search that it rewires default expectations. Both moves require brand-building, not just ML investment.
Which app should I use when the song is distorted coming from a café speaker?
Shazam is usually the most robust choice here, especially its built-in iOS version, because its recognition runs on the native microphone stack. SoundHound excels at lyric snippets when you catch the words but not the audio. Google Hum-to-Search only works if you can hum the track yourself-and that doesn’t solve the café-speaker problem.
Who’s behind ACRCloud, and why does nobody know the app?
ACRCloud is based in Beijing and has specialized since 2014 in B2B audio-recognition for broadcasters, rights-holders, and streaming platforms. There’s no end-user app-just audio APIs for TikTok-style platforms, radio stations, and music-licensing workflows. The brand stays invisible because the business model hides it. In the B2B audio market, ACRCloud delivers cue sheets and royalty tracking that would be hard to replicate on a generic Google stack-so even Spotify uses it internally alongside its own systems.
Will Apple expand Shazam or keep it as a small feature?
Apple keeps Shazam as a standalone app but deepens the engine inside iOS, Apple Music, and Siri. The odds of Apple launching a Shazam Pro tier with set recognition and sample lookup are moderate, because Apple Music remains the primary lever. Any move would likely come only after Apple Music hits growth saturation.
Which providers should indie labels and producers keep on their watchlist?
1001Tracklists and WhoSampled remain essential for DJ-set and sample research. Rights-owner tools like Pex and Audible Magic are growing in importance. On the recognition side itself, keep an eye on Israel’s Cyngn and the UK’s Audio Analytic-both solve adjacent recognition problems so well that a takeover by a major player within the next 18 months is plausible.
[/vc_column_text][/vc_column][/vc_row]

Source of title image: FASTILY / Wikimedia Commons (CC BY-SA 4.0) · Original: https://upload.wikimedia.org/wikipedia/commons/3/34/Apple_Park_1_2017-12-07.jpg

Also available in



X