Phase‑Aware Music Super‑Resolution Using Generative Adversarial Networks (INTERSPEECH 2020)

At INTERSPEECH 2020 the authors introduced a phase‑aware music super‑resolution system that uses a frequency‑domain GAN combined with an enhanced Griffin‑Lim algorithm to reconstruct missing high‑frequency magnitude and phase, delivering brighter, louder, and more natural‑sounding recordings that surpass traditional interpolation and naive phase‑flipping methods.

Tencent Music Tech Team
Tencent Music Tech Team
Tencent Music Tech Team
Phase‑Aware Music Super‑Resolution Using Generative Adversarial Networks (INTERSPEECH 2020)

The paper "Phase‑aware music super‑resolution using generative adversarial networks" was selected for the INTERSPEECH 2020 conference, marking TME's first participation in the event. The work addresses music audio super‑resolution, aiming to restore high‑frequency components that are missing in low‑quality recordings, thereby improving perceived brightness, loudness, and overall listening experience.

Motivation and Challenges – While audio super‑resolution has been explored mainly for speech, music presents additional difficulties: complex spectral structures due to overlapping instruments, abundant high‑frequency energy, and stricter subjective quality requirements (low distortion, natural timbre, high MOS). Phase information in the high‑frequency band is especially problematic; common approaches flip low‑frequency phase, which reduces high‑frequency energy and introduces audible artifacts.

Typical Industry Approaches – Two main families exist: (1) time‑domain interpolation (effective but different from simple resampling) and (2) frequency‑domain inpainting (mapping low‑frequency spectra to high‑frequency spectra). Recent deep‑learning methods use DNNs to learn the relationship between low‑ and high‑resolution signals.

Proposed Solution – The authors adopt a frequency‑domain GAN (Mel‑GAN) as the baseline to generate high‑frequency magnitude spectra from low‑frequency inputs. To solve the high‑frequency phase loss, they improve the Griffin‑Lim algorithm and integrate a phase‑aware module into the GAN pipeline, enabling accurate reconstruction of missing phase information. The overall system architecture is illustrated in the accompanying diagram.

Results – Objective and subjective evaluations show that the proposed method yields larger loudness, clearer timbre, and higher listener satisfaction compared with traditional interpolation or naive phase‑flipping techniques, especially for old or low‑quality music recordings.

For full technical details, the English manuscript is available on arXiv (https://arxiv.org/pdf/2010.04506) and audio samples can be downloaded from the GitHub repository (https://github.com/tencentmusic/TME-Audio-Super-Resolution-Samples). The authors will present the work online on October 29, 2020, 20:30‑21:30 (UTC+8) in the "Speech Enhancement, Bandwidth Extension and Hearing Aids" session of INTERSPEECH 2020.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GANaudio super-resolutionINTERSPEECH 2020music enhancementphase reconstruction
Tencent Music Tech Team
Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.