In my last article I told you about Fourier transform as a way of signals representation in a frequency domain. I also promised to tell you how it's applied in such a wonderful service that is called Shazam, which identifies the song by a short musical excerpt. This app can be downloaded on the iPhone, on Android and other platforms.
Let’s pretend that you are at a concert and there is a lovely song that you don’t know but want to remember – turn Shazam on and then the song title and artist, as well as additional information - lyrics, videos, biography of the artist, concert tickets and recommended tracks would be sent to you. In this article, I won't give any complex mathematical formulas, but will try in to explain music recognition algorithms in a simple language
How the Fourier transform is connected with Shazam algorithms?
The discrete Fourier transform, which I told you about in a previous article, will help to transform a finite set of signal samples taken at regular intervals of time, into a list of the coefficients of the final combination of complex sinusoids, ordered by frequency. It will help to study the spectrum of the signal and to determine which frequencies exist in this signal and which not. After that, you can filter, amplify or attenuate certain frequencies, or simply recognize the sound of a certain height among the available set of frequencies or get the signature of signals - take "fingerprints", to put it in simple language.
And now let's go to the technical part of the work of Shazam.
Common steps are:
- Card-index with an imprint of music was created and saved into the
database of Shazam.
- User "notes" the song that he heard on which an imprint is
generated on the basis of a ten-second audio sample.
- The application sends the imprint to Shazam service, which looks for
matches in the database.
- If the matches are found then you will be notified about this and all the information about the track will be displayed.
That's how the imprinting works:
Shazam could see music as a simple graph - spectrogram. On one axis of it there is a time (x-axis), on the other - the frequency(y-axis), the third, vertical line, has got the intensity.
Here is an example of how the song might look:
Shazam algorithm makes an imprint of the song by creating three-dimensional graphics and detecting the frequency of "peak intensity".
Shazam is building its catalog of imprints in the form of a hash table in which the key role is played by the frequency value. Receiving an imprint, Shazam uses different keys to find some similar songs. Their hash table might look like this:
Some additional details:
They are looking for a pair of points - "peak intensity" plus a second "reference point." Therefore, their key contains not only a single frequency; it has got frequencies from both points. That leads to fewer collisions (when two different hash key matches) and speeds up the search through a catalog, allowing them to make more use of the average run time.
The top graph: Scatterplot of matching hash locations haven't found a diagonal so the songs are not the same.
The bottom graph: matching frequency observed at one time, so the songs are identical.
If there was not only one match between songs then the time-frequency matching will be checked. A two-dimensional frequency plot on which the match occurred is developed. On one axis there is the time of the appearance of frequency in the track, a similar time for the sample. If among the set of points there is a correlation, points form a diagonal. If such line is found then it is the song that you have searched for and it names will be displayed to you.
So you see that it is not really hard to understand how the Shazam works, but it has got a rather complicated scheme and you must know that this is only a basic algorithm - in fact, Shazam uses the upgraded one and we'll never know it for sure as every developer keeps everything in secret.
Follow me, if you are a geek like me or want to learn more about technologies and scientific/educational topics
Alex aka @phenom