Tagged articles

9 articles

Page 1 of 1

Apr 29, 2026 · Artificial Intelligence

What’s Inside GPT‑6’s ‘Spud’ Release? 5‑6 Trillion Parameters and 2 M Token Context

OpenAI’s GPT‑6 ‘Spud’ launch packs 5‑6 trillion parameters with MoE sparsity, a unified Symphony multimodal architecture, dual System‑1/2 reasoning, a 2‑million‑token window, and competitive benchmark results, while keeping pricing flat and introducing autonomous agent capabilities that reshape AI workflows.

GPT-6Multimodalagent

0 likes · 15 min read

What’s Inside GPT‑6’s ‘Spud’ Release? 5‑6 Trillion Parameters and 2 M Token Context

AI Engineering

Apr 1, 2026 · Artificial Intelligence

Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use

H Company’s new Holo3 series delivers a visual language model that outperforms GPT‑5.4 on the OSWorld‑Verified benchmark with a 78.85% score while costing only about one‑tenth as much, offering both a flagship API‑only version and an open‑source lightweight variant optimized for GUI agents.

AI BenchmarkGUI AgentHolo3

0 likes · 4 min read

Holo3 AI Model Beats GPT‑5.4 at One‑Tenth the Cost for Computer Use

Fun with Large Models

Feb 17, 2026 · Artificial Intelligence

Inside Qwen3.5: The World’s Strongest Open‑Source Multimodal Model and Its Core Features

Qwen3.5‑397B‑A17B, the newly open‑sourced multimodal giant, combines a 400‑billion‑parameter sparse MoE architecture with FP8 pipelines and an asynchronous RL framework to deliver GPT‑5.2‑level capabilities, 60% lower memory usage, up to 19× higher throughput, and extensive image, video, and agent support, while outlining its deployment requirements and API pricing.

AI inferenceFP8multimodal model

0 likes · 11 min read

Inside Qwen3.5: The World’s Strongest Open‑Source Multimodal Model and Its Core Features

AI Engineering

Feb 16, 2026 · Artificial Intelligence

Qwen3.5-397B: 397B‑Parameter Multimodal LLM Boosts Inference Speed 8‑19×

Alibaba’s Qwen3.5-397B-A17B, a 397‑billion‑parameter open‑source multimodal LLM, combines mixed linear attention with a sparse MoE architecture to achieve 8.6‑19× higher decoding throughput than Qwen3‑Max, supports 201 languages, and can be deployed via vLLM, Docker, Transformers, or SGLang with various optimization presets.

Inference Optimizationlarge language modelmultimodal LLM

0 likes · 8 min read

Qwen3.5-397B: 397B‑Parameter Multimodal LLM Boosts Inference Speed 8‑19×

AntTech

Oct 28, 2025 · Artificial Intelligence

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Introducing Ming‑Flash‑Omni‑Preview, a 103‑billion‑parameter open‑source multimodal model built on a sparse MoE architecture that delivers state‑of‑the‑art performance in controllable image generation, streaming video understanding, and context‑aware speech recognition, surpassing prior models on GenEval and GEdit benchmarks.

Image GenerationMultimodallarge language model

0 likes · 8 min read

Ming-Flash-Omni-Preview: 103B Open-Source Multimodal Model Excelling in Image, Video, and Speech

Bighead's Algorithm Notes

Oct 23, 2025 · Artificial Intelligence

FinCast: A Foundation Model for Financial Time‑Series Forecasting

FinCast introduces a decoder‑only Transformer foundation model for financial time‑series forecasting that tackles non‑stationarity, multi‑domain diversity, and multi‑resolution challenges through input chunking with frequency embeddings, a sparse MoE decoder, and a PQ‑loss, achieving zero‑shot and supervised gains over state‑of‑the‑art baselines while running five times faster on consumer GPUs.

PQ lossTransformerfinancial time series

0 likes · 12 min read

FinCast: A Foundation Model for Financial Time‑Series Forecasting

AI Algorithm Path

Sep 14, 2025 · Artificial Intelligence

Qwen3-Next: Achieving Unmatched Training and Inference Cost‑Effectiveness

Alibaba's Qwen team unveils Qwen3-Next, a hybrid expert LLM with 800 B parameters but only 30 B active, delivering training costs under one‑tenth of comparable dense models and more than ten‑fold inference throughput for long contexts, while matching or surpassing larger models on benchmark tasks.

LLMMulti-token PredictionQwen3-Next

0 likes · 9 min read

Qwen3-Next: Achieving Unmatched Training and Inference Cost‑Effectiveness

Baobao Algorithm Notes

Sep 10, 2025 · Artificial Intelligence

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

A recent Hugging Face pull request reveals Alibaba’s upcoming Qwen3‑Next series, highlighting its extreme‑context, parameter‑efficient design that combines a 1:50 high‑sparsity MoE, a hybrid attention architecture mixing gated attention with Gated DeltaNet, and a Multi‑Token Prediction technique, promising ten‑fold throughput gains for 32K‑plus token contexts.

AI ArchitectureLarge Language ModelsMulti-token Prediction

0 likes · 8 min read

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

DataFunSummit

Mar 22, 2024 · Artificial Intelligence

Multi‑Layer Efficiency Challenges and Emerging Paradigms for Large Language Models

The article discusses how large AI models are moving toward a unified architecture that reduces task‑algorithm coupling, outlines the multi‑layer efficiency challenges—from model sparsity and quantization to software and infrastructure optimization—and highlights recent NVIDIA GTC 2024 and China AI Day events with registration details.

China AI DayNVIDIA GTCmodel efficiency

0 likes · 12 min read

Multi‑Layer Efficiency Challenges and Emerging Paradigms for Large Language Models