YouTube May Soon Let You Search For Songs Just By Humming
YouTube is testing a new feature that will let you find a song by just humming the tune into your phone's mic. The latest experiment will also work if you just put your phone near a speaker or any other sound source for times when you don't feel like crooning from the depths of your heart. The feature needs three or more seconds of audio to find the track you are looking for. "Once the song is identified, you'll be sent to relevant official music content," says the official support page.
Apart from the official source of the song — likely the artist or music label's handle — YouTube will also show a list of other content such as Shorts and user-generated clips with that song playing in the background. The latest YouTube experiment is currently limited to its Android app but it has already started reaching a small bunch of users. You can access it by launching the voice search feature in the YouTube app, where you can just hum instead of dictating a song's title or artist details.
You can already test it with Google Assistant
The hum-to-identify feature on YouTube is a global test, but there is no clarity about when it will be released widely. In the meantime, if you really like the convenience, you can try the same feature with Google Assistant. Google's virtual assistant received the same ability all the way back in 2020. The hum-to-search feature works for over 20 languages and can be directly launched from within the Google app. This implementation actually offers more flexibility, as it identifies all the likely songs that match your humming and then lets you enjoy them in any music app of your choice, not just YouTube.
Google's new hum to search feature is really good. Works well with Hindi, English, and Urdu classics as well. Even worked flawlessly for two Persian songs I hummed.
Here it is, perfectly detecting "Aaj Jaane Ki Zid Na Karo" pic.twitter.com/0qHcZmWcBN— Nadeemonics (@nsnadeemsarwar) October 16, 2020
The whole system relies on machine learning algorithms and finds potential matches using a melody fingerprinting technique. It is somewhat similar to the audio-focused generative AI models, like those developed by Meta and Microsoft's own VALL-E, which need only a few seconds of a person's audio recording. The AI subsequently breaks apart details like tone, pitch, and the signature pronunciation style. All the data is then condensed into a model that can read any passage mimicking a person's original voice. A company named ElevenLabs even offers a model that can translate your voice into 30 languages with the same distinct audio signature.