How ByteDance’s AI Transforms Music Creation and Discovery on TikTok
ByteDance leverages advanced AI models such as SpectTNT, semi‑supervised music tagging transformers, language identification, chord recognition, contrastive representation learning, and source separation to power TikTok’s massive music library, enabling seamless music‑video interaction, smarter recommendations, and new creative tools for creators worldwide.
Music & Visual Interaction Technology Simplifies Creation
TikTok has become a major channel for music promotion, turning short‑video BGM hits into chart‑topping tracks across platforms. ByteDance’s vast music library, containing billions of audio fragments, is powered by SAMI (Speech, Audio and Music Intelligence), which enables deep audio analysis and intelligent content creation.
SpectTNT: Time‑Frequency Transformer for Music Audio
SpectTNT is a novel deep‑learning model designed for music spectrogram extraction. It converts audio signals into spectrograms via short‑time Fourier transform, then applies time‑frequency transformer layers with residual connections to capture high‑level features, supporting tasks such as vocal‑melody extraction, music structure analysis, and improved audio‑visual matching.
ISMIR 2021 Paper: SpecTNT: a Time‑Frequency Transformer for Music Audio
Semi‑Supervised Music Tagging Transformer
To organize the massive music catalog, ByteDance introduced a semi‑supervised Transformer model that tags music by genre and similarity, reducing reliance on manual labeling and outperforming large‑scale residual networks. This model powers music recommendation in products like Resso, TikTok, and Jianying.
ISMIR 2021 Paper: Semi‑supervised Music Tagging Transformer
Music Language Identification for Multilingual Users
ByteDance’s music language identification system detects dozens of languages within a track, providing language composition ratios that improve user retention in multilingual markets. The system combines log‑Mel spectrograms, a 50‑layer deep residual network, and multimodal metadata to output language predictions.
ISMIR 2021 Paper: Listen, Read, and Identify: Multimodal Singing Language Identification of Music
Automatic Chord Recognition Enhances AI Composition
A deep autoregressive distillation model (NADE) enables rich chord recognition across massive MIDI datasets, providing high‑quality chord segments for AI‑driven music composition and improving the coherence of generated music.
ISMIR 2021 Paper: A deep learning method for enforcing coherence in Automatic Chord Recognition
Contrastive Learning of Musical Representations (CLMR)
CLMR learns universal music embeddings with minimal labeled data using contrastive learning and extensive audio augmentations. The model achieves strong performance on downstream audio classification tasks, reducing data annotation costs.
ISMIR 2021 Paper: Contrastive Learning of Musical Representations
Music Structure Analysis for Creative Potential
ByteDance’s music structure analysis detects highlights and loops, enabling intelligent music length extension in editing tools like Xigua. This technology improves natural transitions and supports various creative video effects.
ISMIR 2021 Paper: Supervised Metric Learning for Music Structure Features
Music Source Separation Advances
A 143‑layer deep residual network jointly estimates magnitude and phase spectra, surpassing traditional methods. The model achieves an 8.98 dB improvement in vocal separation, facilitating tasks such as background music replacement and high‑quality source extraction.
ISMIR 2021 Paper: Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
