Mar 6, 2025 · Artificial Intelligence

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

The paper introduces a Semantic‑Concentrated Multi‑Head Self‑Attention (SCMHSA) module and a new embedding‑space loss to address semantic dilution and loss‑target mismatch in Transformer‑based video next‑frame prediction, demonstrating significant PSNR and MSE gains across four benchmark datasets.

Embedding LossSCMHSASemantic Dilution

0 likes · 23 min read

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution