Build a Music Genre Classifier with KNN and MFCC from Scratch
This tutorial walks through building a music‑genre classification system using the GTZAN dataset, extracting MFCC features, implementing a K‑Nearest Neighbors classifier in Python, and achieving roughly 70% accuracy on test data.
Introduction
Audio classification aims to assign each audio file to a predefined genre, reducing the need for manual listening.
Project Overview and Method
The task is defined as: given many audio files, predict the genre (e.g., Disco, Hip‑hop, etc.). Four common approaches are listed—multiclass SVM, K‑Nearest Neighbors (KNN), K‑means clustering, and Convolutional Neural Networks. KNN is selected because research shows it provides strong performance and is widely used in recommendation systems.
Dataset
The GTZAN genre collection contains about 1,000 .wav files across ten genres: Blues, Hip‑hop, Classical, Pop, Disco, Country, Metal, Jazz, Reggae, and Rock. The dataset can be downloaded from Kaggle:
https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification
Library Installation
Install the required Python libraries before loading data and building the model:
!pip install python_speech_features
!pip install scipyImplementation Steps
Step 1 – Import Required Libraries
import numpy as np
import pandas as pd
import scipy.io.wavfile as wav
from python_speech_features import mfcc
from tempfile import TemporaryFile
import os, math, pickle, random, operatorStep 2 – Define a Function to Find Nearest Neighbors
def getNeighbors(trainingset, instance, k):
distances = []
for x in range(len(trainingset)):
dist = distance(trainingset[x], instance, k) + distance(instance, trainingset[x], k)
distances.append((trainingset[x][2], dist))
distances.sort(key=operator.itemgetter(1))
neighbors = []
for x in range(k):
neighbors.append(distances[x][0])
return neighborsStep 3 – Determine the Majority Class of Neighbors
def nearestclass(neighbors):
classVote = {}
for x in range(len(neighbors)):
response = neighbors[x]
if response in classVote:
classVote[response] += 1
else:
classVote[response] = 1
sorter = sorted(classVote.items(), key=operator.itemgetter(1), reverse=True)
return sorter[0][0]Step 4 – Model Evaluation Function
def getAccuracy(testSet, prediction):
correct = 0
for x in range(len(testSet)):
if testSet[x][-1] == prediction[x]:
correct += 1
return 1.0 * correct / len(testSet)Step 5 – Feature Extraction (MFCC)
MFCC extracts 39 features (12 frequency‑related, plus deltas and energy). The process includes framing the audio (20‑40 ms), applying a discrete cosine transform, and computing statistics.
directory = '../input/gtzan-dataset-music-genre-classification/Data/genres_original'
f = open("mydataset.dat", "wb")
i = 0
for folder in os.listdir(directory):
i += 1
if i == 11:
break
for file in os.listdir(directory + '/' + folder):
try:
(rate, sig) = wav.read(directory + '/' + folder + '/' + file)
mfcc_feat = mfcc(sig, rate, winlen=0.020, appendEnergy=False)
covariance = np.cov(np.matrix.transpose(mfcc_feat))
mean_matrix = mfcc_feat.mean(0)
feature = (mean_matrix, covariance, i)
pickle.dump(feature, f)
except Exception as e:
print("Got an exception:", e, 'in folder:', folder, ' filename:', file)
f.close()Step 6 – Train‑Test Split
dataset = []
def loadDataset(filename, split, trset, teset):
with open('my.dat', 'rb') as f:
while True:
try:
dataset.append(pickle.load(f))
except EOFError:
f.close()
break
for x in range(len(dataset)):
if random.random() < split:
trset.append(dataset[x])
else:
teset.append(dataset[x])
trainingSet = []
testSet = []
loadDataset('my.dat', 0.68, trainingSet, testSet)Step 7 – Distance Calculation Between Instances
def distance(instance1, instance2, k):
mm1 = instance1[0]
cm1 = instance1[1]
mm2 = instance2[0]
cm2 = instance2[1]
dist = np.trace(np.dot(np.linalg.inv(cm2), cm1)
dist += np.dot(np.dot((mm2-mm1).transpose(), np.linalg.inv(cm2)), mm2-mm1)
dist += np.log(np.linalg.det(cm2)) - np.log(np.linalg.det(cm1))
dist -= k
return distStep 8 – Train Model and Predict
length = len(testSet)
predictions = []
for x in range(length):
predictions.append(nearestclass(getNeighbors(trainingSet, testSet[x], 5)))
accuracy1 = getAccuracy(testSet, predictions)
print(accuracy1)Step 9 – Test Classifier on New Audio Files
from collections import defaultdict
results = defaultdict(int)
directory = "../input/gtzan-dataset-music-genre-classification/Data/genres_original"
i = 1
for folder in os.listdir(directory):
results[i] = folder
i += 1
# Example prediction for a new feature vector "feature"
pred = nearestclass(getNeighbors(dataset, feature, 5))
print(results[pred])Conclusion
The pipeline extracts MFCC features, builds a KNN classifier from scratch, and achieves about 70% accuracy on the GTZAN test set.
Audio classification relies on short‑time amplitude and frequency variations.
MFCC provides 39 features, with 12 directly related to frequency amplitude, offering sufficient spectral information for genre discrimination.
MFCC works by framing audio, applying a DCT to remove pitch information, and capturing context‑independent features.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
