Artificial Intelligence 13 min read

Simple Music Recommendation System: Audio‑Feature and Playlist‑Based Approaches

This article presents two straightforward music recommendation methods—content‑based filtering using audio features and collaborative filtering using playlist data—detailing their design ideas, key Python and Go code snippets, model training, evaluation, and possible improvements.

Rare Earth Juejin Tech Community

Feb 10, 2022

Simple Music Recommendation System

This article provides two simple traditional music recommendation system ideas (next‑songs direction) and their implementations, omitting mathematical theory and machine‑learning details. The full code is available at the linked GitHub repository.

1. Audio‑Feature Based Approach

Analyzes audio features to perform content‑based filtering (CBF).

1.1 Design Idea

A listener who enjoys Bach may also like Chopin; therefore we can train a model on audio clips to distinguish genres and recommend songs with similar style. song-classification.ipynb implements the model training using the UVic "genres" dataset, which contains well‑labeled music segments.

$ ls genres
blues    country   hiphop    metal     reggae
classical disco     jazz      pop       rock

The clips are converted to mel‑spectrograms with the librosa library.

These spectrograms are fed into a 1‑D convolution‑pooling stack followed by a fully connected classification head.

def cnn_model(input_shape):
    inputs = Input(input_shape)
    x = inputs
    # 1‑D conv‑pooling
    levels = 64
    for level in range(3):
        x = Conv1D(levels, 3, activation='relu')(x)
        x = BatchNormalization()(x)
        x = MaxPooling1D(pool_size=2, strides=2)(x)
        levels *= 2
    # x -> shape(128)
    x = GlobalMaxPooling1D()(x)
    # fully‑connected layers for label prediction
    for fc in range(2):
        x = Dense(256, activation='relu')(x)
        x = Dropout(0.5)(x)
    labels = Dense(10, activation='softmax')(x)
    model = Model(inputs=[inputs], outputs=[labels])
    sgd = SGD(learning_rate=0.0003, momentum=0.9, decay=1e-5, nesterov=True)
    model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = cnn_model((128, 128))

The trained model ( song_classify.h5) classifies clearly distinct genres (e.g., classical) well but struggles with ambiguous ones such as rock.

Using this model in index-local-mp3s.ipynb, we extract mel‑spectrograms from a small personal MP3 collection, remove the final classification head, and keep the convolutional feature extractor to obtain 256‑dimensional vectors for each track.

cnn_model = load_model('song_classify.h5')
vectorize_model = Model(inputs=cnn_model.input, outputs=cnn_model.layers[-4].output)
vectors = vectorize_model.predict(inputs)

These vectors are fed to an unsupervised nearest‑neighbors model to compute similarity between songs.

nbrs = NearestNeighbors(n_neighbors=10, algorithm='ball_tree').fit(vectors)

def most_similar_songs(song_idx):
    distances, indices = nbrs.kneighbors(vectors[song_idx*10 : song_idx*10+10])
    c = Counter()
    for row in indices:
        for idx in row[1:]:
            c[idx // 10] += 1
    return c.most_common()

def print_similar_songs(song_idx, start=1, end=6):
    print("Target song:", song_name(song_idx))
    for idx, score in most_similar_songs(song_idx)[start:end]:
        print(f"[Similarity {score}] {song_name(idx)}")

The resulting recommendations are shown in the accompanying figure.

1.2 Model Advantages and Limitations

This method relies solely on audio content, does not need user interaction data, and can recommend unseen tracks, but it requires full audio processing, is computationally heavy, and can only recommend tracks that exist locally.

1.3 Improvement Directions

Enlarge the training dataset beyond the UVic genres collection.

Refine the network architecture, possibly leveraging pretrained models or transfer learning.

Incorporate additional modalities such as metadata (artist, album, duration) and lyrics.

2. Playlist‑Based Collaborative Filtering

Uses historical user‑generated playlists to perform collaborative filtering (CF).

2.1 Data Acquisition

The spotify-playlist.ipynb notebook fetches random playlists via the Spotify API, storing only metadata. Because large‑scale data collection is unstable in Python, a robust Go implementation in the spotify/ directory writes the data to a SQLite database.

The database currently holds several gigabytes: 177,889 playlists, 801,357 artists, and 4,995,249 tracks.

sqlite> select count(*) from playlists;
177889
sqlite> select count(*) from artists;
801357
sqlite> select count(*) from tracks;
4995249

2.2 Word2Vec Approach

In train-a-music-recommender.ipynb, each track is treated as a word and each playlist as a sentence, then a Word2Vec model is trained.

sentences = [
    ["track_1_id", "track_2_id", ...],  # playlist_1
    [...],                               # playlist_2
    ...
]

model = gensim.models.Word2Vec(sentences=PlaylistTracksIter(DB), min_count=4)

Given a track, the most similar tracks are retrieved.

def suggest_songs(song_id):
    similar = dict(model.wv.most_similar([song_id]))
    song_ids = ', '.join("'%s'" % x for x in similar.keys())
    c = conn.cursor()
    c.execute("SELECT * FROM tracks WHERE id in (%s)" % song_ids)
    res = sorted((rec + (similar[rec[4]], find_artists(rec[4])))
                 for rec in c.fetchall()),
                 key=itemgetter(-1), reverse=True)
    return suggest_songs_result([*res])

The Word2Vec recommendations are illustrated below.

2.3 Surprise KNNBaseline Approach

In surprise.ipynb, tracks become items and playlists become users; a rating of 1 indicates the track appears in the playlist. The Surprise library’s KNNBaseline algorithm performs item‑based collaborative filtering.

from surprise import KNNBaseline, Reader, Dataset
reader = Reader(rating_scale=(0, 1))
train_data = Dataset.load_from_df(pt_train[['userID','itemID','rating']], reader)
trainset = train_data.build_full_trainset()
sim_options = {'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.fit(trainset)

Nearest‑neighbor queries return similar tracks.

def find_sim(track_id, k=5):
    sim = algo.get_neighbors(iid=algo.trainset.to_inner_iid(track_id), k=k)
    track_ids = [track_id] + list(map(algo.trainset.to_raw_iid, sim))
    tracks = []
    c = conn.cursor()
    for tid in track_ids:
        c.execute(f"SELECT * FROM tracks WHERE id = '{tid}'")
        tk = c.fetchall()[0]
        tracks.append(tk + (find_artists(tid),))
    c.close()
    return sim_result(tracks)

The resulting recommendations are shown in the figure.

2.4 Model Advantages and Limitations

This traditional user‑data‑driven approach is mature and can achieve good recommendation quality with massive data, but it requires heavy storage and processing resources and may create filter bubbles.

2.5 Improvement Directions

Explore more advanced algorithms beyond the baseline KNN.

Gather larger, higher‑quality datasets to boost performance.

Incorporate additional sources such as NetEase music metadata, comments, and popularity metrics for a richer hybrid model.

References

[1] Douwe Osinga. Deep Learning Cookbook. O'Reilly, 2018: 210‑227.

[2] Nicolas Hug. Surprise: A Python library for recommender systems. Journal of Open Source Software, 2020, 5(52): 2174.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning Python collaborative filtering music recommendation content-based filtering audio feature extraction

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.