Build a Music Genre Classifier from Scratch with KNN and MFCC

This tutorial walks through constructing a complete music‑genre classification project using Python, covering dataset preparation, MFCC feature extraction, K‑Nearest Neighbors implementation, train‑test splitting, model evaluation, and testing on new audio files, all with reproducible code snippets.

Data Party THU
Data Party THU
Data Party THU
Build a Music Genre Classifier from Scratch with KNN and MFCC

Introduction

Audio classification is one of the most challenging tasks in data science. Music genre classification aims to assign each audio file to a predefined genre, automating a process that would otherwise require listening to every track.

Project Overview and Method

The goal is to build a genre classifier from the ground up using machine‑learning and deep‑learning techniques, specifically a K‑Nearest Neighbors (KNN) algorithm combined with Mel‑Frequency Cepstral Coefficients (MFCC) for feature extraction.

Dataset

The GTZAN genre dataset, available on Kaggle, contains roughly 1,000 .wav files across ten genres: Blues, Hip‑hop, Classical, Pop, Disco, Country, Metal, Jazz, Reggae, and Rock.

Dataset URL: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification

Library Installation

!pip install python_speech_features</code>
<code>!pip install scipy

These libraries provide MFCC extraction ( python_speech_features) and WAV file handling ( scipy.io.wavfile).

Implementation Steps

Step 1 – Import Required Libraries

import numpy as np
import pandas as pd
import scipy.io.wavfile as wav
from python_speech_features import mfcc
from tempfile import TemporaryFile
import os, math, pickle, random, operator

Step 2 – Define Helper Functions

Functions for distance calculation, neighbor retrieval, class voting, and accuracy computation are implemented as follows:

def getNeighbors(trainingset, instance, k):
    distances = []
    for x in range(len(trainingset)):
        dist = distance(trainingset[x], instance, k) + distance(instance, trainingset[x], k)
        distances.append((trainingset[x][2], dist))
    distances.sort(key=operator.itemgetter(1))
    return [distances[i][0] for i in range(k)]

def nearestclass(neighbors):
    classVote = {}
    for x in range(len(neighbors)):
        response = neighbors[x]
        classVote[response] = classVote.get(response, 0) + 1
    sorter = sorted(classVote.items(), key=operator.itemgetter(1), reverse=True)
    return sorter[0][0]

def getAccuracy(testSet, predictions):
    correct = sum(1 for i in range(len(testSet)) if testSet[i][-1] == predictions[i])
    return correct / float(len(testSet))

def distance(instance1, instance2, k):
    mm1, cm1 = instance1[0], instance1[1]
    mm2, cm2 = instance2[0], instance2[1]
    dist = np.trace(np.dot(np.linalg.inv(cm2), cm1))
    dist += np.dot(np.dot((mm2-mm1).T, np.linalg.inv(cm2)), mm2-mm1)
    dist += np.log(np.linalg.det(cm2)) - np.log(np.linalg.det(cm1))
    dist -= k
    return dist

Step 3 – Extract MFCC Features and Serialize

directory = '../input/gtzan-dataset-music-genre-classification/Data/genres_original'
with open('mydataset.dat', 'wb') as f:
    i = 0
    for folder in os.listdir(directory):
        i += 1
        if i == 11: break
        for file in os.listdir(os.path.join(directory, folder)):
            try:
                rate, sig = wav.read(os.path.join(directory, folder, file))
                mfcc_feat = mfcc(sig, rate, winlen=0.020, appendEnergy=False)
                covariance = np.cov(mfcc_feat.T)
                mean_matrix = mfcc_feat.mean(0)
                feature = (mean_matrix, covariance, i)
                pickle.dump(feature, f)
            except Exception as e:
                print('Got an exception:', e, 'in folder:', folder, 'file:', file)

Step 4 – Load Dataset and Split into Train/Test

dataset = []

def loadDataset(filename, split, trset, teset):
    with open(filename, 'rb') as f:
        while True:
            try:
                dataset.append(pickle.load(f))
            except EOFError:
                break
    for x in range(len(dataset)):
        if random.random() < split:
            trset.append(dataset[x])
        else:
            teset.append(dataset[x])

trainingSet = []
testSet = []
loadDataset('mydataset.dat', 0.68, trainingSet, testSet)

Step 5 – Train and Predict with KNN

predictions = []
for x in range(len(testSet)):
    neighbors = getNeighbors(trainingSet, testSet[x], 5)
    predictions.append(nearestclass(neighbors))
accuracy = getAccuracy(testSet, predictions)
print('Accuracy:', accuracy)

Step 6 – Classify a New Audio File

from collections import defaultdict
results = defaultdict(int)
for i, folder in enumerate(os.listdir(directory), start=1):
    results[i] = folder

# Assume `feature` is the MFCC tuple for the new file
pred = nearestclass(getNeighbors(dataset, feature, 5))
print('Predicted genre:', results[pred])

Conclusion and Lessons Learned

The project demonstrates that a simple KNN classifier built from scratch can achieve around 70 % accuracy on the GTZAN dataset. Key takeaways include the importance of MFCC as a compact representation of audio, the effect of frame‑level analysis on feature quality, and practical tips such as using try/except when loading large datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonKNNAudio ProcessingMFCCMusic Genre Classification
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.