Tagged articles

position encoding

6 articles · Page 1 of 1

Jun 4, 2026 · Artificial Intelligence

Bernini: An Open‑Source AI Model that Masterfully Handles Diverse Video Editing Tasks

Bernini combines a multimodal large language model with a diffusion renderer, uses a semantic planner‑renderer architecture, segment‑aware 3D position encoding and chain‑of‑thought reasoning, and achieves state‑of‑the‑art results on a 300‑case benchmark that outperforms closed‑source competitors.

BenchmarkBerniniLLM

0 likes · 11 min read

Bernini: An Open‑Source AI Model that Masterfully Handles Diverse Video Editing Tasks

Tencent Cloud Developer

Dec 9, 2025 · Artificial Intelligence

How Do Large Language Models Turn Text into Math? A Deep Dive into Transformers

This article walks through the complete workflow of AI large language models, from turning user queries into token matrices via tokenization and embedding, through the Transformer’s self‑attention and multi‑head mechanisms, to decoding logits into human‑readable text, while also covering position encoding, long‑context strategies, generation parameters, and practical engineering tips.

Inference OptimizationLarge Language ModelsSelf-Attention

0 likes · 29 min read

How Do Large Language Models Turn Text into Math? A Deep Dive into Transformers

AI Frontier Lectures

May 10, 2025 · Artificial Intelligence

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

A new study introduces the lightweight “Canon” layer for large language models, showing how it improves information flow, inference depth, and scalability across Transformers, linear attention, and state‑space architectures, while offering a controlled synthetic pre‑training benchmark for deeper architectural analysis.

AI researchLarge Language ModelsMamba

0 likes · 11 min read

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

NewBeeNLP

Aug 3, 2024 · Artificial Intelligence

Extending LLM Context to 1M Tokens: SAMBA, CoPE, RoPE, Retrieval Heads & Infini‑Attention

This article reviews recent research on extending large language model context windows to millions of tokens, covering SAMBA's hybrid architecture, Contextual Position Encoding (CoPE), RoPE base length theory, Retrieval Head analysis, and the memory‑efficient Infini‑Attention mechanism.

Efficient AttentionLLM researchLarge Language Models

0 likes · 10 min read

Extending LLM Context to 1M Tokens: SAMBA, CoPE, RoPE, Retrieval Heads & Infini‑Attention

AntTech

Jun 15, 2022 · Artificial Intelligence

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

XYLayoutLM introduces a layout‑aware multimodal network that improves visually‑rich document understanding by augmenting XY‑Cut for robust reading order generation and employing a Dilated Conditional Position Encoding to handle variable‑length inputs, achieving state‑of‑the‑art performance on XFUN and FUNSD datasets.

MultimodalVision TransformerXYCut

0 likes · 10 min read

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

Sohu Tech Products

Aug 4, 2021 · Artificial Intelligence

Technical Summary of the 2021 Sohu Campus Text Matching Algorithm Competition

This article presents a comprehensive technical summary of the 2021 Sohu Campus Text Matching Algorithm Competition, detailing data characteristics, preprocessing strategies, tokenization choices, positional encoding methods, model architectures using relative encodings such as WoBERT and RoFormer, experimental results, and reflections on future improvements.

Model DesignMulti-Task LearningNLP

0 likes · 9 min read

Technical Summary of the 2021 Sohu Campus Text Matching Algorithm Competition