Commit 35085898 authored by Marie Tahon's avatar Marie Tahon
Browse files


parent 9403281b
# spectral-clustering-music
This project aims at automatically extract musical structure from an audio file.
### Prerequisites
This project needs the following python3 packages
pip3 install librosa scikit-learn argparse scipy numpy matplotlib warnings
### Usage
python3 audio/filename.wav
### Parameters
Modifiable parameters are located in
begin = 0.0 # start reading audio file after this time in seconds.
duration = 60 # read audio file between (begin) and (begin + duration)
BINS_PER_OCTAVE = 12*3 # number of bins per octave, a multiple of 12 is suitable
N_OCTAVES = 7 # number of considered octaves
NFFT = 2**11 # number of points of the analysis window, a multiple of 2 is suitable (> 2**6)
STEP = NFFT / 2 # number of points between two consecutive windows (overlap of 50%)
feat = ['cepstral'] # name of feature set for local similarity, among ['cepstral', 'chroma', 'spectral']
opt_tuning = False # activate automatic extraction of tuning deviation from A 440Hz for chroma features calculation
cluster_method = 'fixed' # name of method for the determination of the optimal number of clusters among 'fixed', 'evals', 'max', 'silhouette', 'calinski-harabaz', 'davies-bouldin'
cluster_nb = [4] # list of integers coresponding to different numbers of cluster (with cluster_method = 'fixed')
cluster_max = 10 # maximum number of clusters (with cluster_method = 'max')
cluster_dist = True # activate the calculation and representation of cosine distance between each cluster and a reference.
cluster_nb_max = 5 # maximum number of clusters per second
onset = 'beat' # synchronisation of features on timestamps obained from 'beat', or 'onset' extraction or fixed window ('no')
onset_percu = False # activate the analysis on percussive signal only
onset_plot = False # activate the visualization of spectrogram and synchronized spectrogram
onset_simi = True # activate the visualization of similarity matrices: local, long-term and weighted matrices
onset_struct = True # activate the visualization of structure components: local similarity matrix, eigen vectors and clusters.
timestamps = True # create a text file which contains timestamps of obtained musical sections (i.e. clusters)
### Select number of clusters
The key issue of this approach is to find the correct number of musical sections (or clusters). We propose a compilation of different methods from state-of-the-art studies:
* fixed: the user a priori knows the number of sections he wants
* evals: the number of clusters is optimized using Laplacian eigen values (needs the additional script
* max: this method find the number of cluster which maximizes the quality of the alignement with eigen vectors in the range [1, cluser_max]
* silhouette: maximizes the silhouette index
* calinski-harabaz: maximizes the calinski-harabaz index
* davies-bouldin: maximizes the davies-bouldin index
### Bibliography
* B. McFee and D P.W. Ellis (2014). Analyzing song structure with spectral clustering. In proc. of International Society for Music Information Retrieval.
* L. Zelnik-Manor and P. Perona (2004). Self-Tuning Spectral Clustering. In Advances in Neural Information Processing Systems (NIPS), pp. 1601—1608.
* D. Deutsch (2007). Music perception. In Frontiers of Bioscience, vol. 12, pp. 4473—4482.
* J.-C. Lamirel, N. Dugué and P. Cuxac (2016). New efficient clustering quality indexes. In proc; of International Joint Conference on Neural Networks (IJCNN 2016), Vancouver, Canada
### Authors
* Marie Tahon: adaptation of intial code and addition of new functionalities
* Brian McFee: initial code [Laplacian segmentation](
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment