Simple Music Recommendation System: Audio‑Feature and Playlist‑Based Approaches
This article presents two straightforward music recommendation methods—content‑based filtering using audio features and collaborative filtering using playlist data—detailing their design ideas, key Python and Go code snippets, model training, evaluation, and possible improvements.
Simple Music Recommendation System
This article provides two simple traditional music recommendation system ideas (next‑songs direction) and their implementations, omitting mathematical theory and machine‑learning details. The full code is available at the linked GitHub repository.
1. Audio‑Feature Based Approach
Analyzes audio features to perform content‑based filtering (CBF).
1.1 Design Idea
A listener who enjoys Bach may also like Chopin; therefore we can train a model on audio clips to distinguish genres and recommend songs with similar style.
song-classification.ipynb implements the model training using the UVic "genres" dataset, which contains well‑labeled music segments.
$ ls genres
blues country hiphop metal reggae
classical disco jazz pop rockThe clips are converted to mel‑spectrograms with the librosa library.
These spectrograms are fed into a 1‑D convolution‑pooling stack followed by a fully connected classification head.
def cnn_model(input_shape):
inputs = Input(input_shape)
x = inputs
# 1‑D conv‑pooling
levels = 64
for level in range(3):
x = Conv1D(levels, 3, activation='relu')(x)
x = BatchNormalization()(x)
x = MaxPooling1D(pool_size=2, strides=2)(x)
levels *= 2
# x -> shape(128)
x = GlobalMaxPooling1D()(x)
# fully‑connected layers for label prediction
for fc in range(2):
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
labels = Dense(10, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=[labels])
sgd = SGD(learning_rate=0.0003, momentum=0.9, decay=1e-5, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])
return model
model = cnn_model((128, 128))The trained model ( song_classify.h5 ) classifies clearly distinct genres (e.g., classical) well but struggles with ambiguous ones such as rock.
Using this model in index-local-mp3s.ipynb , we extract mel‑spectrograms from a small personal MP3 collection, remove the final classification head, and keep the convolutional feature extractor to obtain 256‑dimensional vectors for each track.
cnn_model = load_model('song_classify.h5')
vectorize_model = Model(inputs=cnn_model.input, outputs=cnn_model.layers[-4].output)
vectors = vectorize_model.predict(inputs)These vectors are fed to an unsupervised nearest‑neighbors model to compute similarity between songs.
nbrs = NearestNeighbors(n_neighbors=10, algorithm='ball_tree').fit(vectors)
def most_similar_songs(song_idx):
distances, indices = nbrs.kneighbors(vectors[song_idx*10 : song_idx*10+10])
c = Counter()
for row in indices:
for idx in row[1:]:
c[idx // 10] += 1
return c.most_common()
def print_similar_songs(song_idx, start=1, end=6):
print("Target song:", song_name(song_idx))
for idx, score in most_similar_songs(song_idx)[start:end]:
print(f"[Similarity {score}] {song_name(idx)}")The resulting recommendations are shown in the accompanying figure.
1.2 Model Advantages and Limitations
This method relies solely on audio content, does not need user interaction data, and can recommend unseen tracks, but it requires full audio processing, is computationally heavy, and can only recommend tracks that exist locally.
1.3 Improvement Directions
Enlarge the training dataset beyond the UVic genres collection.
Refine the network architecture, possibly leveraging pretrained models or transfer learning.
Incorporate additional modalities such as metadata (artist, album, duration) and lyrics.
2. Playlist‑Based Collaborative Filtering
Uses historical user‑generated playlists to perform collaborative filtering (CF).
2.1 Data Acquisition
The spotify-playlist.ipynb notebook fetches random playlists via the Spotify API, storing only metadata. Because large‑scale data collection is unstable in Python, a robust Go implementation in the spotify/ directory writes the data to a SQLite database.
The database currently holds several gigabytes: 177,889 playlists, 801,357 artists, and 4,995,249 tracks.
sqlite> select count(*) from playlists;
177889
sqlite> select count(*) from artists;
801357
sqlite> select count(*) from tracks;
49952492.2 Word2Vec Approach
In train-a-music-recommender.ipynb , each track is treated as a word and each playlist as a sentence, then a Word2Vec model is trained.
sentences = [
["track_1_id", "track_2_id", ...], # playlist_1
[...], # playlist_2
...
] model = gensim.models.Word2Vec(sentences=PlaylistTracksIter(DB), min_count=4)Given a track, the most similar tracks are retrieved.
def suggest_songs(song_id):
similar = dict(model.wv.most_similar([song_id]))
song_ids = ', '.join("'%s'" % x for x in similar.keys())
c = conn.cursor()
c.execute("SELECT * FROM tracks WHERE id in (%s)" % song_ids)
res = sorted((rec + (similar[rec[4]], find_artists(rec[4])))
for rec in c.fetchall()),
key=itemgetter(-1), reverse=True)
return suggest_songs_result([*res])The Word2Vec recommendations are illustrated below.
2.3 Surprise KNNBaseline Approach
In surprise.ipynb , tracks become items and playlists become users; a rating of 1 indicates the track appears in the playlist. The Surprise library’s KNNBaseline algorithm performs item‑based collaborative filtering.
from surprise import KNNBaseline, Reader, Dataset
reader = Reader(rating_scale=(0, 1))
train_data = Dataset.load_from_df(pt_train[['userID','itemID','rating']], reader)
trainset = train_data.build_full_trainset()
sim_options = {'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.fit(trainset)Nearest‑neighbor queries return similar tracks.
def find_sim(track_id, k=5):
sim = algo.get_neighbors(iid=algo.trainset.to_inner_iid(track_id), k=k)
track_ids = [track_id] + list(map(algo.trainset.to_raw_iid, sim))
tracks = []
c = conn.cursor()
for tid in track_ids:
c.execute(f"SELECT * FROM tracks WHERE id = '{tid}'")
tk = c.fetchall()[0]
tracks.append(tk + (find_artists(tid),))
c.close()
return sim_result(tracks)The resulting recommendations are shown in the figure.
2.4 Model Advantages and Limitations
This traditional user‑data‑driven approach is mature and can achieve good recommendation quality with massive data, but it requires heavy storage and processing resources and may create filter bubbles.
2.5 Improvement Directions
Explore more advanced algorithms beyond the baseline KNN.
Gather larger, higher‑quality datasets to boost performance.
Incorporate additional sources such as NetEase music metadata, comments, and popularity metrics for a richer hybrid model.
References
[1] Douwe Osinga. Deep Learning Cookbook. O'Reilly, 2018: 210‑227.
[2] Nicolas Hug. Surprise: A Python library for recommender systems. Journal of Open Source Software, 2020, 5(52): 2174.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.