Data Party THU
Data Party THU
Feb 4, 2026 · Artificial Intelligence

How Sakana AI Redefines Long-Context Transformers: DroPE, REPO, and FwPKM Explained

This article analyzes Sakana AI's three recent papers that challenge traditional Transformer long‑sequence handling by removing positional embeddings, reconstructing position awareness, and adding a fast‑weight external memory, showing how each approach improves ultra‑long text understanding.

Memory MechanismPositional EmbeddingTransformer
0 likes · 12 min read
How Sakana AI Redefines Long-Context Transformers: DroPE, REPO, and FwPKM Explained
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 25, 2023 · Artificial Intelligence

Unlocking GLM & ChatGLM: Deep Dive into MindSpore Large‑Model Techniques

The MindSpore Season 2 open class offers a comprehensive overview of GLM to ChatGLM architectures, positional‑embedding strategies, stable training optimizations, and step‑by‑step instructions for deploying large language models with Ascend, ModelArts, and MindSpore Transformers, while previewing upcoming multimodal remote‑sensing sessions.

Artificial IntelligenceChatGLMGLM
0 likes · 6 min read
Unlocking GLM & ChatGLM: Deep Dive into MindSpore Large‑Model Techniques