Tagged articles
9 articles
Page 1 of 1
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 29, 2026 · Artificial Intelligence

What’s Inside GPT‑6’s ‘Spud’ Release? 5‑6 Trillion Parameters and 2 M Token Context

OpenAI’s GPT‑6 ‘Spud’ launch packs 5‑6 trillion parameters with MoE sparsity, a unified Symphony multimodal architecture, dual System‑1/2 reasoning, a 2‑million‑token window, and competitive benchmark results, while keeping pricing flat and introducing autonomous agent capabilities that reshape AI workflows.

GPT-6Multimodalagent
0 likes · 15 min read
What’s Inside GPT‑6’s ‘Spud’ Release? 5‑6 Trillion Parameters and 2 M Token Context
AI Engineering
AI Engineering
Apr 1, 2026 · Artificial Intelligence

Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use

H Company’s new Holo3 series delivers a visual language model that outperforms GPT‑5.4 on the OSWorld‑Verified benchmark with a 78.85% score while costing only about one‑tenth as much, offering both a flagship API‑only version and an open‑source lightweight variant optimized for GUI agents.

AI BenchmarkGUI AgentHolo3
0 likes · 4 min read
Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use
Fun with Large Models
Fun with Large Models
Feb 17, 2026 · Artificial Intelligence

Inside Qwen3.5: The World’s Strongest Open‑Source Multimodal Model and Its Core Features

Qwen3.5‑397B‑A17B, the newly open‑sourced multimodal giant, combines a 400‑billion‑parameter sparse MoE architecture with FP8 pipelines and an asynchronous RL framework to deliver GPT‑5.2‑level capabilities, 60% lower memory usage, up to 19× higher throughput, and extensive image, video, and agent support, while outlining its deployment requirements and API pricing.

AI inferenceFP8multimodal model
0 likes · 11 min read
Inside Qwen3.5: The World’s Strongest Open‑Source Multimodal Model and Its Core Features
AI Engineering
AI Engineering
Feb 16, 2026 · Artificial Intelligence

Qwen3.5-397B: 397B‑Parameter Multimodal LLM Boosts Inference Speed 8‑19×

Alibaba’s Qwen3.5-397B-A17B, a 397‑billion‑parameter open‑source multimodal LLM, combines mixed linear attention with a sparse MoE architecture to achieve 8.6‑19× higher decoding throughput than Qwen3‑Max, supports 201 languages, and can be deployed via vLLM, Docker, Transformers, or SGLang with various optimization presets.

Inference Optimizationlarge language modelmultimodal LLM
0 likes · 8 min read
Qwen3.5-397B: 397B‑Parameter Multimodal LLM Boosts Inference Speed 8‑19×
AntTech
AntTech
Oct 28, 2025 · Artificial Intelligence

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Introducing Ming‑Flash‑Omni‑Preview, a 103‑billion‑parameter open‑source multimodal model built on a sparse MoE architecture that delivers state‑of‑the‑art performance in controllable image generation, streaming video understanding, and context‑aware speech recognition, surpassing prior models on GenEval and GEdit benchmarks.

Image GenerationMultimodallarge language model
0 likes · 8 min read
Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 23, 2025 · Artificial Intelligence

FinCast: A Foundation Model for Financial Time‑Series Forecasting

FinCast introduces a decoder‑only Transformer foundation model for financial time‑series forecasting that tackles non‑stationarity, multi‑domain diversity, and multi‑resolution challenges through input chunking with frequency embeddings, a sparse MoE decoder, and a PQ‑loss, achieving zero‑shot and supervised gains over state‑of‑the‑art baselines while running five times faster on consumer GPUs.

PQ lossTransformerfinancial time series
0 likes · 12 min read
FinCast: A Foundation Model for Financial Time‑Series Forecasting
AI Algorithm Path
AI Algorithm Path
Sep 14, 2025 · Artificial Intelligence

Qwen3-Next: Achieving Unmatched Training and Inference Cost‑Effectiveness

Alibaba's Qwen team unveils Qwen3-Next, a hybrid expert LLM with 800 B parameters but only 30 B active, delivering training costs under one‑tenth of comparable dense models and more than ten‑fold inference throughput for long contexts, while matching or surpassing larger models on benchmark tasks.

LLMMulti-token PredictionQwen3-Next
0 likes · 9 min read
Qwen3-Next: Achieving Unmatched Training and Inference Cost‑Effectiveness
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 10, 2025 · Artificial Intelligence

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

A recent Hugging Face pull request reveals Alibaba’s upcoming Qwen3‑Next series, highlighting its extreme‑context, parameter‑efficient design that combines a 1:50 high‑sparsity MoE, a hybrid attention architecture mixing gated attention with Gated DeltaNet, and a Multi‑Token Prediction technique, promising ten‑fold throughput gains for 32K‑plus token contexts.

AI ArchitectureLarge Language ModelsMulti-token Prediction
0 likes · 8 min read
Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction
DataFunSummit
DataFunSummit
Mar 22, 2024 · Artificial Intelligence

Multi‑Layer Efficiency Challenges and Emerging Paradigms for Large Language Models

The article discusses how large AI models are moving toward a unified architecture that reduces task‑algorithm coupling, outlines the multi‑layer efficiency challenges—from model sparsity and quantization to software and infrastructure optimization—and highlights recent NVIDIA GTC 2024 and China AI Day events with registration details.

China AI DayNVIDIA GTCmodel efficiency
0 likes · 12 min read
Multi‑Layer Efficiency Challenges and Emerging Paradigms for Large Language Models