Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python
This article introduces the Python library librosa, outlines its main audio processing features such as loading, visualization, MFCC, pitch detection, chromagram, and rhythm analysis, and provides complete code examples for each operation.
librosa is a powerful Python library focused on music and audio signal processing, widely used in Music Information Retrieval (MIR). Built on NumPy and SciPy, it offers efficient functions for loading audio files, visualizing signals, extracting features (e.g., spectral analysis, MFCC, rhythm detection), pitch detection, melody extraction, and beat synchronization.
Main Features
Audio I/O: supports reading and writing multiple audio file formats.
Pre‑processing: audio signal standardization, normalization, denoising, etc.
Time‑frequency analysis: Short‑time Fourier Transform (STFT), Mel‑frequency cepstral coefficients (MFCC), chromagram, rhythm analysis.
Pitch detection: provides fundamental frequency (F0) estimation algorithms.
Structural analysis: track segmentation, beat detection, melody extraction.
Visualization: uses matplotlib to plot waveforms, spectrograms, time‑frequency maps, and other charts.
Loading Audio and Viewing Basic Information
import librosa
# Load audio file
y, sr = librosa.load('example.mp3')
# Print sampling rate and duration
print("采样率:", sr)
print("音频长度(秒):", librosa.get_duration(y=y, sr=sr))
# Display waveform
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()Compute and Display MFCCs
mfccs = librosa.feature.mfcc(y=y, sr=sr)
# Show MFCC image
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.ylabel('MFCC Coefficients')
plt.xlabel('Time')
plt.colorbar()
plt.show()Compute and Display Pitch Contour (Fundamental Frequency)
f0 = librosa.yin(y, sr=sr)
# Convert continuous f0 to discrete MIDI notes
pitch = librosa.hz_to_midi(f0)
plt.figure(figsize=(10, 4))
librosa.display.waveplot(y, sr=sr, alpha=0.5)
plt.plot(librosa.times_like(f0), pitch, color='r')
plt.legend(['Pitch'])
plt.show()Compute Short‑Term Energy and Spectral Centroid
energy = librosa.feature.rmse(y=y)
spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
# Display energy and spectral centroid
plt.figure(figsize=(12, 6))
librosa.display.specshow(energy, y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Energy')
plt.subplot(2, 1, 2)
librosa.display.specshow(spectral_centroid, y_axis='log')
plt.colorbar(format='%+2.0f Hz')
plt.title('Spectral Centroid')
plt.tight_layout()
plt.show()Extract and Visualize Chromagram
chroma = librosa.feature.chroma_stft(y=y, sr=sr)
librosa.display.specshow(chroma, x_axis='time', y_axis='chroma', cmap='coolwarm')
plt.title('Chromagram')
plt.colorbar()
plt.show()Perform STFT and Show Spectrogram
stft = librosa.stft(y)
# Magnitude and phase spectra
mag = np.abs(stft)
phase = np.angle(stft)
# Display magnitude spectrogram
librosa.display.specshow(librosa.amplitude_to_db(mag), y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()Detect and Segment Silent Sections
# Detect silent segments
threshold = -40
min_silence_len = 1000 # ms
silence_indices = librosa.effects.split(y, top_db=threshold, frame_length=1024, hop_length=512)
# Process each non‑silent segment
for i, slice in enumerate(silence_indices):
non_silent_part = y[slice[0]:slice[1]]
# Handle each segment (e.g., save or further analysis)Compute and Visualize Rhythm Features
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
# Convert beat frames to time
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.waveplot(y, sr=sr)
plt.vlines(beat_times, ymin=-1, ymax=1, color='r', linestyle='--')
plt.show()Use Constant‑Q Transform (CQT)
cqt = librosa.cqt(y=y, sr=sr)
# Show CQT spectrogram
librosa.display.specshow(librosa.amplitude_to_db(np.abs(cqt)), y_axis='cqt_note', x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Constant-Q power spectrogram')
plt.show()Compute and Display Tonal Centroid (Tonnetz)
tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.specshow(tonnetz, x_axis='time', y_axis='tonnetz')
plt.colorbar()
plt.title('Tonnetz')
plt.show()All code snippets assume that librosa and its dependencies are installed; in real applications, you may need to adapt and extend these examples for specific tasks and larger audio datasets.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.