Build a Music Genre Classifier from Scratch with KNN and MFCC
This tutorial walks through constructing a complete music‑genre classification project using Python, covering dataset preparation, MFCC feature extraction, K‑Nearest Neighbors implementation, train‑test splitting, model evaluation, and testing on new audio files, all with reproducible code snippets.
Introduction
Audio classification is one of the most challenging tasks in data science. Music genre classification aims to assign each audio file to a predefined genre, automating a process that would otherwise require listening to every track.
Project Overview and Method
The goal is to build a genre classifier from the ground up using machine‑learning and deep‑learning techniques, specifically a K‑Nearest Neighbors (KNN) algorithm combined with Mel‑Frequency Cepstral Coefficients (MFCC) for feature extraction.
Dataset
The GTZAN genre dataset, available on Kaggle, contains roughly 1,000 .wav files across ten genres: Blues, Hip‑hop, Classical, Pop, Disco, Country, Metal, Jazz, Reggae, and Rock.
Dataset URL: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification
Library Installation
!pip install python_speech_features</code>
<code>!pip install scipyThese libraries provide MFCC extraction ( python_speech_features) and WAV file handling ( scipy.io.wavfile).
Implementation Steps
Step 1 – Import Required Libraries
import numpy as np
import pandas as pd
import scipy.io.wavfile as wav
from python_speech_features import mfcc
from tempfile import TemporaryFile
import os, math, pickle, random, operatorStep 2 – Define Helper Functions
Functions for distance calculation, neighbor retrieval, class voting, and accuracy computation are implemented as follows:
def getNeighbors(trainingset, instance, k):
distances = []
for x in range(len(trainingset)):
dist = distance(trainingset[x], instance, k) + distance(instance, trainingset[x], k)
distances.append((trainingset[x][2], dist))
distances.sort(key=operator.itemgetter(1))
return [distances[i][0] for i in range(k)]
def nearestclass(neighbors):
classVote = {}
for x in range(len(neighbors)):
response = neighbors[x]
classVote[response] = classVote.get(response, 0) + 1
sorter = sorted(classVote.items(), key=operator.itemgetter(1), reverse=True)
return sorter[0][0]
def getAccuracy(testSet, predictions):
correct = sum(1 for i in range(len(testSet)) if testSet[i][-1] == predictions[i])
return correct / float(len(testSet))
def distance(instance1, instance2, k):
mm1, cm1 = instance1[0], instance1[1]
mm2, cm2 = instance2[0], instance2[1]
dist = np.trace(np.dot(np.linalg.inv(cm2), cm1))
dist += np.dot(np.dot((mm2-mm1).T, np.linalg.inv(cm2)), mm2-mm1)
dist += np.log(np.linalg.det(cm2)) - np.log(np.linalg.det(cm1))
dist -= k
return distStep 3 – Extract MFCC Features and Serialize
directory = '../input/gtzan-dataset-music-genre-classification/Data/genres_original'
with open('mydataset.dat', 'wb') as f:
i = 0
for folder in os.listdir(directory):
i += 1
if i == 11: break
for file in os.listdir(os.path.join(directory, folder)):
try:
rate, sig = wav.read(os.path.join(directory, folder, file))
mfcc_feat = mfcc(sig, rate, winlen=0.020, appendEnergy=False)
covariance = np.cov(mfcc_feat.T)
mean_matrix = mfcc_feat.mean(0)
feature = (mean_matrix, covariance, i)
pickle.dump(feature, f)
except Exception as e:
print('Got an exception:', e, 'in folder:', folder, 'file:', file)Step 4 – Load Dataset and Split into Train/Test
dataset = []
def loadDataset(filename, split, trset, teset):
with open(filename, 'rb') as f:
while True:
try:
dataset.append(pickle.load(f))
except EOFError:
break
for x in range(len(dataset)):
if random.random() < split:
trset.append(dataset[x])
else:
teset.append(dataset[x])
trainingSet = []
testSet = []
loadDataset('mydataset.dat', 0.68, trainingSet, testSet)Step 5 – Train and Predict with KNN
predictions = []
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x], 5)
predictions.append(nearestclass(neighbors))
accuracy = getAccuracy(testSet, predictions)
print('Accuracy:', accuracy)Step 6 – Classify a New Audio File
from collections import defaultdict
results = defaultdict(int)
for i, folder in enumerate(os.listdir(directory), start=1):
results[i] = folder
# Assume `feature` is the MFCC tuple for the new file
pred = nearestclass(getNeighbors(dataset, feature, 5))
print('Predicted genre:', results[pred])Conclusion and Lessons Learned
The project demonstrates that a simple KNN classifier built from scratch can achieve around 70 % accuracy on the GTZAN dataset. Key takeaways include the importance of MFCC as a compact representation of audio, the effect of frame‑level analysis on feature quality, and practical tips such as using try/except when loading large datasets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
