Artificial Intelligence 7 min read

Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python

This article introduces the Python library librosa, outlines its main audio processing features such as loading, visualization, MFCC, pitch detection, chromagram, and rhythm analysis, and provides complete code examples for each operation.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python

librosa is a powerful Python library focused on music and audio signal processing, widely used in Music Information Retrieval (MIR). Built on NumPy and SciPy, it offers efficient functions for loading audio files, visualizing signals, extracting features (e.g., spectral analysis, MFCC, rhythm detection), pitch detection, melody extraction, and beat synchronization.

Main Features

Audio I/O: supports reading and writing multiple audio file formats.

Pre‑processing: audio signal standardization, normalization, denoising, etc.

Time‑frequency analysis: Short‑time Fourier Transform (STFT), Mel‑frequency cepstral coefficients (MFCC), chromagram, rhythm analysis.

Pitch detection: provides fundamental frequency (F0) estimation algorithms.

Structural analysis: track segmentation, beat detection, melody extraction.

Visualization: uses matplotlib to plot waveforms, spectrograms, time‑frequency maps, and other charts.

Loading Audio and Viewing Basic Information

import librosa
# Load audio file
y, sr = librosa.load('example.mp3')
# Print sampling rate and duration
print("采样率:", sr)
print("音频长度(秒):", librosa.get_duration(y=y, sr=sr))
# Display waveform
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

Compute and Display MFCCs

mfccs = librosa.feature.mfcc(y=y, sr=sr)
# Show MFCC image
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.ylabel('MFCC Coefficients')
plt.xlabel('Time')
plt.colorbar()
plt.show()

Compute and Display Pitch Contour (Fundamental Frequency)

f0 = librosa.yin(y, sr=sr)
# Convert continuous f0 to discrete MIDI notes
pitch = librosa.hz_to_midi(f0)
plt.figure(figsize=(10, 4))
librosa.display.waveplot(y, sr=sr, alpha=0.5)
plt.plot(librosa.times_like(f0), pitch, color='r')
plt.legend(['Pitch'])
plt.show()

Compute Short‑Term Energy and Spectral Centroid

energy = librosa.feature.rmse(y=y)
spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
# Display energy and spectral centroid
plt.figure(figsize=(12, 6))
librosa.display.specshow(energy, y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Energy')
plt.subplot(2, 1, 2)
librosa.display.specshow(spectral_centroid, y_axis='log')
plt.colorbar(format='%+2.0f Hz')
plt.title('Spectral Centroid')
plt.tight_layout()
plt.show()

Extract and Visualize Chromagram

chroma = librosa.feature.chroma_stft(y=y, sr=sr)
librosa.display.specshow(chroma, x_axis='time', y_axis='chroma', cmap='coolwarm')
plt.title('Chromagram')
plt.colorbar()
plt.show()

Perform STFT and Show Spectrogram

stft = librosa.stft(y)
# Magnitude and phase spectra
mag = np.abs(stft)
phase = np.angle(stft)
# Display magnitude spectrogram
librosa.display.specshow(librosa.amplitude_to_db(mag), y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()

Detect and Segment Silent Sections

# Detect silent segments
threshold = -40
min_silence_len = 1000  # ms
silence_indices = librosa.effects.split(y, top_db=threshold, frame_length=1024, hop_length=512)
# Process each non‑silent segment
for i, slice in enumerate(silence_indices):
    non_silent_part = y[slice[0]:slice[1]]
    # Handle each segment (e.g., save or further analysis)

Compute and Visualize Rhythm Features

tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
# Convert beat frames to time
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.waveplot(y, sr=sr)
plt.vlines(beat_times, ymin=-1, ymax=1, color='r', linestyle='--')
plt.show()

Use Constant‑Q Transform (CQT)

cqt = librosa.cqt(y=y, sr=sr)
# Show CQT spectrogram
librosa.display.specshow(librosa.amplitude_to_db(np.abs(cqt)), y_axis='cqt_note', x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Constant-Q power spectrogram')
plt.show()

Compute and Display Tonal Centroid (Tonnetz)

tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.specshow(tonnetz, x_axis='time', y_axis='tonnetz')
plt.colorbar()
plt.title('Tonnetz')
plt.show()

All code snippets assume that librosa and its dependencies are installed; in real applications, you may need to adapt and extend these examples for specific tasks and larger audio datasets.

Feature ExtractionAudio ProcessinglibrosaMIRsignal analysis
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.