Artificial Intelligence 7 min read

Training an Audio Quality Detection Model Using Synthetic Noise and PESQ Scoring

This article explains how to generate low‑quality audio samples from clean speech by randomly inserting noise at various SNR levels, compute objective PESQ scores as ground‑truth, and use these paired data to train a neural‑network model for reference‑free audio quality assessment.

360 Quality & Efficiency

Aug 27, 2021

Training an Audio Quality Detection Model Using Synthetic Noise and PESQ Scoring

The training of an audio quality detection model faces a shortage of degraded audio datasets and difficulty in evaluating their quality scores. To address this, a method is proposed that creates low‑quality audio solely from a clean high‑quality speech corpus and assigns scores using the PESQ algorithm.

Subjective evaluation of speech quality typically relies on MOS (Mean Opinion Score), which requires many listeners to rate audio on a 1‑5 scale; scores above 4 indicate good quality, while below 3 denote unacceptable quality.

Objective evaluation methods fall into two categories: reference‑based (e.g., PESQ) and no‑reference (e.g., P.563). The presented approach adopts the PESQ algorithm, which compares a degraded signal with its clean reference, aligns them, applies auditory transformations, measures spectral distortion, and maps the result to a MOS‑like PESQ score ranging from –0.5 to 4.5.

Data generation steps:

Randomly select start and end positions for inserting noise into the clean audio.

Calculate the noise scaling factor based on a specified signal‑to‑noise ratio (SNR).

Insert the chosen noise segment into the selected portion of the clean audio.

Compute the PESQ score for the resulting degraded audio.

Implementation details (Python):

def random_sample(n1, n2):
    if n1 < n2:
        start = random.randint(0, n1)
        end = random.randint(start, n1)
    else:
        start = random.randint(0, n2)
        end = random.randint(start, n2)
    return start, end

def add_noise(x, d, SNR):
    P_signal = np.sum(abs(x) ** 2)
    P_d = np.sum(abs(d) ** 2)
    P_noise = P_signal / 10 ** (SNR / 10)
    k = np.sqrt(P_noise / P_d)
    return k

def make_noise_data(high_wave_data, noise_sample_data):
    c_start, c_end = random_sample(len(high_wave_data), len(noise_sample_data))
    n_start = random.randint(0, len(noise_sample_data) - (c_end - c_start))
    n_end = c_end - c_start + n_start
    k = add_noise(high_wave_data, noise_sample_data[n_start:n_end], -10)
    convert_data = high_wave_data[c_start:c_end] + k * noise_sample_data[n_start:n_end]
    new_wave_data = np.concatenate((high_wave_data[:c_start], convert_data, high_wave_data[c_end:]))
    librosa.output.write_wav("noise.wav", new_wave_data, sr)

score = pesq(sr, high_wave_data, low_wave_data, 'nb')

In real‑world scenarios, obtaining a clean reference for every degraded audio segment is impractical, making direct PESQ computation infeasible. By training a neural network on the synthetically generated paired data and their PESQ scores, the model learns to predict audio quality without needing a reference signal, enabling scalable, reference‑free quality assessment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Neural Network Python audio quality PESQ speech assessment synthetic noise

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.