Network Intelligence Research Center (NIRC)
Aug 22, 2023 · Artificial Intelligence
LONGNET: Extending Transformers to Over 1 Billion Tokens
LONGNET introduces dilated attention to enable Transformers to process sequences exceeding one billion tokens with linear computational cost, preserving performance on shorter inputs and demonstrating strong results on long‑sequence modeling and standard language tasks.
Dilated AttentionLONGNETLanguage Modeling
0 likes · 6 min read
