Tagged articles
15 articles
Page 1 of 1
Data Party THU
Data Party THU
May 10, 2026 · Artificial Intelligence

SpikingBrain 2.0 Breaks Long‑Sequence and Low‑Power Bottlenecks in Brain‑Inspired LLMs

The Chinese Academy of Sciences unveils SpikingBrain 2.0‑5B, a brain‑inspired large model that uses dual‑space sparse attention and dual activation (FP8 and INT8‑Spiking) to cut training cost by over tenfold, achieve up to 15× speedup on long sequences, and match Qwen‑3 performance while drastically reducing power consumption.

SpikingBrain2.0benchmark performancebrain-inspired AI
0 likes · 10 min read
SpikingBrain 2.0 Breaks Long‑Sequence and Low‑Power Bottlenecks in Brain‑Inspired LLMs
ShiZhen AI
ShiZhen AI
Apr 8, 2026 · Artificial Intelligence

Why Anthropic’s Claude Mythos Preview Is Too Powerful to Sell

Anthropic’s Claude Mythos Preview uncovered thousands of zero‑day bugs across major operating systems and browsers, outperformed all benchmark suites, and is being kept out of the public market in favor of a exclusive Project Glasswing partnership with twelve tech giants.

AI securityAnthropicClaude Mythos
0 likes · 11 min read
Why Anthropic’s Claude Mythos Preview Is Too Powerful to Sell
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 21, 2026 · Artificial Intelligence

How I Put My Night‑Time GPU to Work: Running a Full‑Automation Research Pipeline with MiniMax M2.7

The article details how MiniMax's M2.7 model, equipped with native multi‑agent collaboration and a 97% instruction‑following rate, autonomously executes an end‑to‑end research workflow—discovering topics, generating experiment roadmaps, fixing bugs, and achieving up to 30% performance gains and a 66.6% Kaggle medal rate—demonstrating a practical leap from benchmark scores to real‑world engineering reliability.

AI AgentsKaggle MLE LiteMiniMax M2.7
0 likes · 9 min read
How I Put My Night‑Time GPU to Work: Running a Full‑Automation Research Pipeline with MiniMax M2.7
AIWalker
AIWalker
Mar 19, 2026 · Artificial Intelligence

Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy

Vision‑R1 introduces a 7B multimodal large language model that leverages 200K unsupervised CoT data, Modality Bridging, and Progressive Thinking Suppression Training to overcome data scarcity and over‑thinking, achieving 73.5% accuracy on MathVista—within 0.4% of OpenAI’s O1.

Multimodal Reasoningbenchmark performancechain-of-thought
0 likes · 12 min read
Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy
DataFunTalk
DataFunTalk
Nov 10, 2025 · Artificial Intelligence

How Open-Source AI Models Are Outperforming Closed Giants on Cost and Performance

The article examines how open‑source models like DeepSeek‑R1 and Kimi K2 Thinking are challenging the traditional closed‑source, high‑capital AI paradigm by achieving comparable or superior benchmark results at a fraction of the training cost, reshaping market expectations, investment strategies, and the economics of AI development.

AI market dynamicsMixture of Expertsbenchmark performance
0 likes · 11 min read
How Open-Source AI Models Are Outperforming Closed Giants on Cost and Performance
Kuaishou Large Model
Kuaishou Large Model
Sep 8, 2025 · Artificial Intelligence

Keye-VL-1.5-8B: The New Multimodal LLM That Beats GPT-4o on Vision Benchmarks

Kwai's newly released Keye-VL-1.5-8B multimodal large language model dramatically improves visual, reasoning, and temporal understanding, achieving top scores on public video benchmarks and surpassing closed‑source models like GPT‑4o, while offering an open‑source release and detailed technical documentation.

benchmark performancemultimodal LLMopen-source
0 likes · 11 min read
Keye-VL-1.5-8B: The New Multimodal LLM That Beats GPT-4o on Vision Benchmarks
Kuaishou Tech
Kuaishou Tech
Sep 5, 2025 · Artificial Intelligence

How Keye‑VL‑1.5‑8B Sets New Benchmarks in Multimodal AI

Fast‑search platform Kwai has open‑sourced the 8‑billion‑parameter multimodal LLM Keye‑VL‑1.5, which introduces a slow‑fast frame encoding, a progressive four‑stage pre‑training pipeline, and an automated data construction workflow, achieving state‑of‑the‑art results on video and vision‑language benchmarks and surpassing many closed‑source models.

Multimodal AIbenchmark performancelarge language model
0 likes · 12 min read
How Keye‑VL‑1.5‑8B Sets New Benchmarks in Multimodal AI
Java Tech Enthusiast
Java Tech Enthusiast
Sep 1, 2025 · Artificial Intelligence

How Meituan’s LongCat‑Flash‑Chat Beats Top LLMs with Zero‑Computation Experts

LongCat‑Flash‑Chat, Meituan’s newly open‑sourced 560B MoE model, outperforms leading LLMs on agent tool use and instruction following benchmarks, introduces zero‑computation experts and shortcut‑connected MoE for higher throughput, and demonstrates strong programming and reasoning abilities across diverse evaluation tasks.

Meituan AIModel architectureZero Computation Experts
0 likes · 12 min read
How Meituan’s LongCat‑Flash‑Chat Beats Top LLMs with Zero‑Computation Experts
AI Algorithm Path
AI Algorithm Path
Jul 14, 2025 · Artificial Intelligence

The Most Powerful Open‑Source Agent Model: Kimi K2

Kimi K2, an open‑source trillion‑parameter AI model released by Moonshot AI, offers Base and Instruct variants, achieves leading scores on benchmarks such as SWE‑bench, LiveCodeBench and AceBench, and introduces a novel post‑training autonomous‑exploration stage with MuonClip optimization to enable robust tool use and reinforcement‑learning‑driven self‑improvement.

Autonomous AgentsKimi K2Tool Use
0 likes · 8 min read
The Most Powerful Open‑Source Agent Model: Kimi K2
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 30, 2025 · Artificial Intelligence

How End‑to‑End Reinforcement Learning Powers the Kimi‑Researcher AI Agent

The article examines Kimi‑Researcher, an AI research agent built with end‑to‑end reinforcement learning, detailing its technical motivations, advantages over traditional workflow‑based and SFT methods, performance breakthroughs on benchmark exams, and diverse real‑world use cases ranging from literature reviews to legal analysis.

AI AgentEnd-to-End RLKimi Researcher
0 likes · 10 min read
How End‑to‑End Reinforcement Learning Powers the Kimi‑Researcher AI Agent
Code Mala Tang
Code Mala Tang
Jun 4, 2025 · Artificial Intelligence

Flux Kontext: How Open‑Weight AI Image Editing Beats GPT‑Image‑1

Flux Kontext, Black Forest Labs' new open‑weight AI image editing suite, enables fast, low‑cost contextual generation and editing with features such as role consistency, local edits, style transfer, and superior benchmark performance compared to GPT‑Image‑1, Imagen 4, and other leading models.

AI image generationFlux Kontextbenchmark performance
0 likes · 12 min read
Flux Kontext: How Open‑Weight AI Image Editing Beats GPT‑Image‑1
AIWalker
AIWalker
Apr 13, 2025 · Artificial Intelligence

Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs

Huawei's Pangu Ultra introduces a 135‑billion‑parameter dense language model trained entirely on Ascend NPUs, detailing novel stability architectures, a domain‑aware tokenizer, multi‑stage pre‑training, extensive system optimizations, and benchmark results that surpass leading models such as Llama 405B and DeepSeek‑R1.

Ascend NPUDense ModelSystem optimization
0 likes · 15 min read
Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs
21CTO
21CTO
Mar 27, 2025 · Artificial Intelligence

Google Unveils Gemini 2.5: The Most Advanced Reasoning AI Yet

Google's Gemini 2.5, billed as its most intelligent AI model, introduces advanced reasoning capabilities that outperform rivals on benchmarks like LMArena and Humanity's Last Exam, excels at web and agent code generation, and is now available to premium users via AI Studio with a 1‑million token context window.

AI reasoningCode GenerationGoogle Gemini
0 likes · 4 min read
Google Unveils Gemini 2.5: The Most Advanced Reasoning AI Yet
DevOps
DevOps
Feb 25, 2025 · Artificial Intelligence

Claude 3.7 Sonnet: First Hybrid Reasoning Model with Enhanced Coding Tool and Strong Benchmark Performance

Claude 3.7 Sonnet, Anthropic's new hybrid reasoning model, introduces dual thinking modes, token‑based thinking budget control, unchanged pricing, and the Claude Code tool that automates lengthy coding tasks, while achieving record GPQA scores, superior video‑game testing results, and reduced unnecessary refusals on harmful requests.

AI modelClaudeCoding tool
0 likes · 7 min read
Claude 3.7 Sonnet: First Hybrid Reasoning Model with Enhanced Coding Tool and Strong Benchmark Performance
Python Programming Learning Circle
Python Programming Learning Circle
Apr 3, 2023 · Artificial Intelligence

Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications

GPT‑4, the new multimodal AI model, can process images and text, generate code and natural language, achieve human‑level scores on standardized exams, handle up to 32 K tokens, and demonstrates advanced reasoning, while OpenAI emphasizes its safety improvements and current limitations as a still‑emerging technology.

AI SafetyGPT-4Multimodal AI
0 likes · 6 min read
Key Highlights of GPT‑4: Multimodal Capabilities, Benchmark Performance, and Future Implications