Artificial Intelligence 14 min read

Build a Music Genre Classifier with KNN and MFCC from Scratch

This tutorial walks through building a music‑genre classification system using the GTZAN dataset, extracting MFCC features, implementing a K‑Nearest Neighbors classifier in Python, and achieving roughly 70% accuracy on test data.

Data STUDIO

Sep 15, 2025

Build a Music Genre Classifier with KNN and MFCC from Scratch

Introduction

Audio classification aims to assign each audio file to a predefined genre, reducing the need for manual listening.

Project Overview and Method

The task is defined as: given many audio files, predict the genre (e.g., Disco, Hip‑hop, etc.). Four common approaches are listed—multiclass SVM, K‑Nearest Neighbors (KNN), K‑means clustering, and Convolutional Neural Networks. KNN is selected because research shows it provides strong performance and is widely used in recommendation systems.

Dataset

The GTZAN genre collection contains about 1,000 .wav files across ten genres: Blues, Hip‑hop, Classical, Pop, Disco, Country, Metal, Jazz, Reggae, and Rock. The dataset can be downloaded from Kaggle: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification

Library Installation

Install the required Python libraries before loading data and building the model:

!pip install python_speech_features
!pip install scipy

Implementation Steps

Step 1 – Import Required Libraries

import numpy as np
import pandas as pd
import scipy.io.wavfile as wav
from python_speech_features import mfcc
from tempfile import TemporaryFile
import os, math, pickle, random, operator

Step 2 – Define a Function to Find Nearest Neighbors

def getNeighbors(trainingset, instance, k):
    distances = []
    for x in range(len(trainingset)):
        dist = distance(trainingset[x], instance, k) + distance(instance, trainingset[x], k)
        distances.append((trainingset[x][2], dist))
    distances.sort(key=operator.itemgetter(1))
    neighbors = []
    for x in range(k):
        neighbors.append(distances[x][0])
    return neighbors

Step 3 – Determine the Majority Class of Neighbors

def nearestclass(neighbors):
    classVote = {}
    for x in range(len(neighbors)):
        response = neighbors[x]
        if response in classVote:
            classVote[response] += 1
        else:
            classVote[response] = 1
    sorter = sorted(classVote.items(), key=operator.itemgetter(1), reverse=True)
    return sorter[0][0]

Step 4 – Model Evaluation Function

def getAccuracy(testSet, prediction):
    correct = 0
    for x in range(len(testSet)):
        if testSet[x][-1] == prediction[x]:
            correct += 1
    return 1.0 * correct / len(testSet)

Step 5 – Feature Extraction (MFCC)

MFCC extracts 39 features (12 frequency‑related, plus deltas and energy). The process includes framing the audio (20‑40 ms), applying a discrete cosine transform, and computing statistics.

directory = '../input/gtzan-dataset-music-genre-classification/Data/genres_original'
f = open("mydataset.dat", "wb")
i = 0
for folder in os.listdir(directory):
    i += 1
    if i == 11:
        break
    for file in os.listdir(directory + '/' + folder):
        try:
            (rate, sig) = wav.read(directory + '/' + folder + '/' + file)
            mfcc_feat = mfcc(sig, rate, winlen=0.020, appendEnergy=False)
            covariance = np.cov(np.matrix.transpose(mfcc_feat))
            mean_matrix = mfcc_feat.mean(0)
            feature = (mean_matrix, covariance, i)
            pickle.dump(feature, f)
        except Exception as e:
            print("Got an exception:", e, 'in folder:', folder, ' filename:', file)
f.close()

Step 6 – Train‑Test Split

dataset = []

def loadDataset(filename, split, trset, teset):
    with open('my.dat', 'rb') as f:
        while True:
            try:
                dataset.append(pickle.load(f))
            except EOFError:
                f.close()
                break
    for x in range(len(dataset)):
        if random.random() < split:
            trset.append(dataset[x])
        else:
            teset.append(dataset[x])

trainingSet = []
testSet = []
loadDataset('my.dat', 0.68, trainingSet, testSet)

Step 7 – Distance Calculation Between Instances

def distance(instance1, instance2, k):
    mm1 = instance1[0]
    cm1 = instance1[1]
    mm2 = instance2[0]
    cm2 = instance2[1]
    dist = np.trace(np.dot(np.linalg.inv(cm2), cm1)
    dist += np.dot(np.dot((mm2-mm1).transpose(), np.linalg.inv(cm2)), mm2-mm1)
    dist += np.log(np.linalg.det(cm2)) - np.log(np.linalg.det(cm1))
    dist -= k
    return dist

Step 8 – Train Model and Predict

length = len(testSet)
predictions = []
for x in range(length):
    predictions.append(nearestclass(getNeighbors(trainingSet, testSet[x], 5)))
accuracy1 = getAccuracy(testSet, predictions)
print(accuracy1)

Step 9 – Test Classifier on New Audio Files

from collections import defaultdict
results = defaultdict(int)

directory = "../input/gtzan-dataset-music-genre-classification/Data/genres_original"
i = 1
for folder in os.listdir(directory):
    results[i] = folder
    i += 1

# Example prediction for a new feature vector "feature"
pred = nearestclass(getNeighbors(dataset, feature, 5))
print(results[pred])

Conclusion

The pipeline extracts MFCC features, builds a KNN classifier from scratch, and achieves about 70% accuracy on the GTZAN test set.

Audio classification relies on short‑time amplitude and frequency variations.

MFCC provides 39 features, with 12 directly related to frequency amplitude, offering sufficient spectral information for genre discrimination.

MFCC works by framing audio, applying a DCT to remove pitch information, and capturing context‑independent features.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python KNN Audio Processing MFCC Music Genre Classification

Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.