%
% Supplement to my lectures on Fourier analysis
%
% A first step into audio spectral analysis
% Example for the generation of simple spectral fingerprints of an
% audio file series as data base for a first audio recognition test
%
% R. Brigola, April 2010
%
% The purpose of this example is to experience, whether we can identify
% a music track by a very simple "spectral fingerprint" of a short track
% section of about 30 seconds duration, taken from an arbitrary time
% interval of the track.
%
% In this m-file we compute spectral fingerprints for a series of tracks,
% which serves as the data base for the test of recognition.
% For an unknown track section, taken from a music piece of that series,
% the analog fingerprint has to be computed and compared with the data base.
%
% We test the following type of a spectral fingerprint:
% The bandwidth is partitioned into a series of subbands. For each subband
% we compute the so-called "spectral flatness", i.e. the quotient of the
% geometric and the arithmetic mean of all amplitudes in the subband.
% The vector of these values in the interval [0,1] is considered as
% the spectral fingerprint.
%
% For recognition of a piece we compare its analogous fingerprint with
% these vectors, and "identify" that track, whose fingerprint has on both
% stereo channels the highest correlation to the test example.
% This is done in the m-file fingerprint_test.m
%
% We expect, that this test may only work, if the tracks in test
% are sufficiently diverse (different artists, different music styles)
% with regard to their spectrum and have more or less the same spectrum in
% all time segments.
% This assumption is certainly not fulfilled for many music tracks,
% especially for recordings of the same artists, bands, instrumentations
% etc., which usually show relatively small spectral diversity.
% Also rearrangements of the same musical material yield the same
% amplitude spectrum as we know from Fourier analysis, since there is no
% time information therein.
% This shows, that certainly much more analysis has to be worked out for
% a reliable recognition of a piece in a huge number of music recordings.
% Later on, the tests will confirm this conjecture. Thus, this experiment
% shall only give first impressions of working with spectral methods in
% that example and of the questions arising with it.
%
% A more careful analysis would include Short Time Fourier Analysis with
% time windows, dominant tones, instrumentation and rhythm recognition
% and more. Cepstrum or Wavelet analysis would also be options.
%
% A standard reference on Digital Signal Processing is:
% Oppenheim, A.V., and R.W. Schafer: Discrete-Time Signal Processing
% Englewood Cliffs, NJ: Prentice Hall, 1989.
%
% Other application fields of spectral fingerprints:
% The same methods, which work for music, can similarly be applied in other
% fields, for example in speech or speaker recognition or surveillance,
% when for example the complete telephone traffic of whole countries is
% controlled by secret services. This works, when corresponding data bases
% for comparison are available.
% (For example: Abroad telephone calls from Germany are supervised, the
% USA use the ECHELON espionage system etc.)
%
% Now, even if we have to expect a relatively poor recognition rate,
% let's give it a try just to experience in a first step, what may work
% with very simple methods and what doesn't, and why much research on
% the subject is done worldwide. To learn more on the subject and the
% research on it, see for example the publications on that of the
% Fraunhofer iis institute at Erlangen, Germany, references found with
% google or scholar.google or read on the Shazam or Gracenote music
% recognition services, which are offered as "apps" for mobile phones,
% the website http://www.speaker-recognition.org/ etc.
% GENERATION OF SIMPLE SPECTRAL FINGERPRINTS OF MUSIC TRACKS
% The tracks must be stereo and named as track1, track2, ... trackN,
% and have the wav-format with 44100 samples per second.
%
% Remark: On my notebook the computation time for the fingerprints of
% 20 tracks of about 3 to 4 minutes track lengths is about 170 seconds.
clear all;
start_time=cputime;
% Number N of tracks, for which a spectral fingerprint in the form of a
% spectral flatness vector for each channel is generated.
% The n-th vector component is the spectral flatness of the n-th frequency
% subband. The spectral flatness used is the quotient of the geometric mean
% and the arithmetic mean of all amplitudes in the subband. We test with
% a subband partition as defined below.
N=20;
% Determine a sequence of frequency subbands in Hz
subband_no=22;
subband(1,1)=100;
subband(1,2)=200;
subband(2,1)=subband(1,2);
subband(2,2)=300;
subband(3,1)=subband(2,2);
subband(3,2)=400;
subband(4,1)=subband(3,2);
subband(4,2)=510;
subband(5,1)=subband(4,2);
subband(5,2)=630;
subband(6,1)=subband(5,2);
subband(6,2)=770;
subband(7,1)=subband(6,2);
subband(7,2)=920;
subband(8,1)=subband(7,2);
subband(8,2)=1080;
subband(9,1)=subband(8,2);
subband(9,2)=1270;
subband(10,1)=subband(9,2);
subband(10,2)=1480;
subband(11,1)=subband(10,2);
subband(11,2)=1720;
subband(12,1)=subband(11,2);
subband(12,2)=2000;
subband(13,1)=subband(12,2);
subband(13,2)=2320;
subband(14,1)=subband(13,2);
subband(14,2)=2700;
subband(15,1)=subband(14,2);
subband(15,2)=3150;
subband(16,1)=subband(15,2);
subband(16,2)=3700;
subband(17,1)=subband(16,2);
subband(17,2)=4400;
subband(18,1)=subband(17,2);
subband(18,2)=5300;
subband(19,1)=subband(18,2);
subband(19,2)=6400;
subband(20,1)=subband(19,2);
subband(20,2)=7700;
subband(21,1)=subband(20,2);
subband(21,2)=9500;
subband(22,1)=subband(21,2);
subband(22,2)=12000;
% Initialization of the spectral flatness matrix
sflat=ones(N,subband_no,2); % N tracks, subband no, 2 channels
% Loop over all the available tracks with names track1.wav ... trackN.wav
% Usually the sampling frequency Fs is 44100 samples per second
for no=1:N;
str=int2str(no);
audio_input=strcat('track',str,'.wav'); % input filename
[data,Fs]=wavread(audio_input); % read wav-file with sampling frequency Fs
M=size(data,1); % number of samples per channel
T=floor(M/Fs); % duration of the analyzed audio data
NFFT=T*Fs; % number of used samples
% FFT of the 2 track channels. I do not use a time window here, due to the
% huge number of samples, long observation time and silence at the
% beginning and the end of a track. Thus spectral leakage should be
% negligible here.
%
% Since Matlab supports multithreaded computation, a multicore processor
% considerably saves computation time.
data_fft=fft(data,NFFT); % fft of both stereo channels
% Generate the fingerprint per subband by a spectral flatness value sflat
% for each audio track on both channels
for k=1:subband_no; % Loop over the subbands
n=T*(subband(k,2)-subband(k,1));% number of samples in the k-th subband
% Determine the geometric means of the amplitudes in the k-th subband
% for both channels
nom=prod(abs(data_fft(T*subband(k,1)+1:T*subband(k,2)+1,:)).^(1/n));
% Compute the spectral fingerprint in the k-th subband for both channels
if(nom(1,1)==0.) % channel 1 spectral subband flatness=0,
sflat(no,k,1)=0.; % if any amplitudes are zero
else
% Determine the arithmetic mean of the amplitudes in the k-th subband,
% channel 1
denom=sum(abs(data_fft(T*subband(k,1)+1:T*subband(k,2)+1,1))./n);
sflat(no,k,1)=nom(1,1)/denom; % Channel 1 spectral subband flatness
end;
% Analogously for channel 2
if(nom(1,2)==0.) % channel 2 spectral subband flatness=0,
sflat(no,k,2)=0.; % if any amplitudes are zero
else
% Determine the arithmetic mean of the amplitudes in the k-th subband
% channel 2
denom=sum(abs(data_fft(T*subband(k,1)+1:T*subband(k,2)+1,2))./n);
sflat(no,k,2)=nom(1,2)/denom; % Spectral subband flatness
end;
end;
end;
% Save the fingerprint matrix of the analyzed tracks as data base for
% the recognition of apiece taken from one of the above tracks
save fingerprint_db sflat;
computation_time=cputime - start_time;
% Exercise: Try to test an analogous generation of more fingerprint
% information using Short time Fourier analysis, to obtain
% information on the time patterns of music tracks, and try
% dominant tones recognition per subband.
% Adapt the recognition test.