Tagged articles
649 articles
Page 4 of 7
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Jul 17, 2025 · Artificial Intelligence

Explore the Ultimate Open-Source LLM Catalog: Models, Tools, and Resources

This article compiles a comprehensive, up‑to‑date inventory of open‑source large language models from Chinese and international organizations, detailing each model’s architecture, parameter count, multilingual capabilities, deployment requirements, and associated tools, offering a valuable reference for AI researchers and developers.

AILLMlarge language model
0 likes · 50 min read
Explore the Ultimate Open-Source LLM Catalog: Models, Tools, and Resources
AntTech
AntTech
Jul 17, 2025 · Artificial Intelligence

How M2-Reasoning-7B Achieves State‑of‑the‑Art Spatial Reasoning in Multimodal AI

M2-Reasoning-7B, an open‑source 7B multimodal model from Ant Group, combines a high‑quality data pipeline with dynamic multi‑task training and a novel reward function to deliver state‑of‑the‑art performance on both general and spatial reasoning benchmarks, surpassing many larger competitors.

M2-ReasoningMultimodal AIbenchmark
0 likes · 9 min read
How M2-Reasoning-7B Achieves State‑of‑the‑Art Spatial Reasoning in Multimodal AI
AI Algorithm Path
AI Algorithm Path
Jul 14, 2025 · Artificial Intelligence

The Most Powerful Open‑Source Agent Model: Kimi K2

Kimi K2, an open‑source trillion‑parameter AI model released by Moonshot AI, offers Base and Instruct variants, achieves leading scores on benchmarks such as SWE‑bench, LiveCodeBench and AceBench, and introduces a novel post‑training autonomous‑exploration stage with MuonClip optimization to enable robust tool use and reinforcement‑learning‑driven self‑improvement.

Autonomous AgentsKimi K2Reinforcement Learning
0 likes · 8 min read
The Most Powerful Open‑Source Agent Model: Kimi K2
Architecture and Beyond
Architecture and Beyond
Jul 12, 2025 · Artificial Intelligence

What Exactly Is an AI Agent? History, Architecture, and Future Challenges

This article traces the evolution of AI agents from early expert systems to modern large‑language‑model‑driven assistants, explains their core perception, reasoning, memory, and action modules, compares thinking and execution models, and discusses current limitations such as hallucinations, reliability, cost, and security.

AI AgentMemory ArchitecturePrompt engineering
0 likes · 20 min read
What Exactly Is an AI Agent? History, Architecture, and Future Challenges
Data Thinking Notes
Data Thinking Notes
Jul 8, 2025 · Artificial Intelligence

How Xiaohongshu Leverages Large Models to Revolutionize Content Recommendation

This article details Xiaohongshu's multi‑stage recommendation pipeline—using massive multi‑modal pre‑training, long‑sequence modeling, real‑time context features, reinforcement learning and online deep learning—to precisely surface valuable content, address cold‑start challenges, and break information bubbles for billions of users.

Multimodal LearningReinforcement Learninglarge language model
0 likes · 16 min read
How Xiaohongshu Leverages Large Models to Revolutionize Content Recommendation
JD Tech Talk
JD Tech Talk
Jul 8, 2025 · Artificial Intelligence

How AI Can Turn a Code Maze into a Knowledge Highway for New Developers

New developer Li Ming’s frustrating onboarding experience highlights hidden business rules, undocumented code, and poor knowledge transfer, prompting him to build an AI‑driven knowledge base that links code changes, requirements, and operational docs, ultimately streamlining troubleshooting, accelerating feature development, and improving knowledge retention across teams.

AIRAGcode retrieval
0 likes · 18 min read
How AI Can Turn a Code Maze into a Knowledge Highway for New Developers
DataFunSummit
DataFunSummit
Jul 6, 2025 · Artificial Intelligence

AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research

This article presents a comprehensive overview of cutting‑edge research on integrating large language models with knowledge graphs, covering multimodal GraphRAG, financial AI solutions, traditional Chinese medicine decision support, and industry‑specific knowledge services, guiding readers through emerging paradigms and practical implementations.

AIEnterprise AIMultimodal
0 likes · 2 min read
AI-Driven Knowledge Graphs: Key Insights from Multimodal GraphRAG Research
DataFunTalk
DataFunTalk
Jul 5, 2025 · Artificial Intelligence

DeepSeek R1T2 Chimera: Faster, High‑Performance LLM with Assembly of Experts

The DeepSeek R1T2 Chimera model, an open‑source LLM built with Assembly of Experts technology, delivers up to 200% faster inference than R1‑0528, surpasses R1 on GPQA‑Diamond and AIME‑24 benchmarks, and offers a 671‑billion‑parameter MoE architecture, though it lacks function‑calling support and trails the highest‑end R1‑0528 on the toughest tests.

AIAssembly of ExpertsDeepSeek
0 likes · 5 min read
DeepSeek R1T2 Chimera: Faster, High‑Performance LLM with Assembly of Experts
DataFunTalk
DataFunTalk
Jul 3, 2025 · Artificial Intelligence

Inside xAI’s Grok 4: Massive Funding, Extreme Iteration, and Power Challenges

Elon Musk’s xAI has quietly leaked its upcoming Grok 4 and Grok 4 Code models, skipped Grok 3.5, secured $10 billion in new financing, and is building massive GPU super‑computing facilities, while raising concerns about model bias, data integrity, and unprecedented power‑grid strain.

AI fundingArtificial IntelligenceGPU computing
0 likes · 6 min read
Inside xAI’s Grok 4: Massive Funding, Extreme Iteration, and Power Challenges
DataFunSummit
DataFunSummit
Jul 2, 2025 · Artificial Intelligence

How End-to-End Reinforcement Learning Powers the Kimi Researcher AI Agent

The article explains how Kimi Researcher, an AI Agent built with end‑to‑end reinforcement learning, achieves state‑of‑the‑art performance on the Humanity’s Last Exam benchmark, scales via data‑driven training, and supports diverse research and analysis scenarios.

AI AgentKimi Researcherlarge language model
0 likes · 9 min read
How End-to-End Reinforcement Learning Powers the Kimi Researcher AI Agent
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 30, 2025 · Artificial Intelligence

How End‑to‑End Reinforcement Learning Powers the Kimi‑Researcher AI Agent

The article examines Kimi‑Researcher, an AI research agent built with end‑to‑end reinforcement learning, detailing its technical motivations, advantages over traditional workflow‑based and SFT methods, performance breakthroughs on benchmark exams, and diverse real‑world use cases ranging from literature reviews to legal analysis.

AI AgentEnd-to-End RLKimi Researcher
0 likes · 10 min read
How End‑to‑End Reinforcement Learning Powers the Kimi‑Researcher AI Agent
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 29, 2025 · Artificial Intelligence

Multimodal AI Assistant Boosts Network Config: 96.6% Accuracy, 26× Labor Cut

The paper presents NLI2Conf, an intent‑driven network configuration model that fuses configuration files, topology and performance data via a multimodal interface, using large language and graph neural models to align natural‑language intents with forwarding and performance constraints, achieving 96.6% accuracy and a 26‑fold reduction in manual effort.

Graph Neural NetworkMultimodal AINLI2Conf
0 likes · 6 min read
Multimodal AI Assistant Boosts Network Config: 96.6% Accuracy, 26× Labor Cut
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 27, 2025 · Artificial Intelligence

Build a Powerful AI Search RAG Application with PAI‑LangStudio, Qwen3 & Elasticsearch

This guide walks you through using the PAI‑LangStudio platform together with the Qwen3 large language model and Elasticsearch to create a full‑stack AI Search RAG solution, covering prerequisites, step‑by‑step configuration of model services, database connections, runtimes, knowledge bases, workflow creation, testing, and deployment for production use.

AI searchElasticsearchPAI‑LangStudio
0 likes · 11 min read
Build a Powerful AI Search RAG Application with PAI‑LangStudio, Qwen3 & Elasticsearch
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 25, 2025 · Cloud Computing

Control Alibaba Cloud Resources with LLMs and MCP Server in Minutes

This article explains how to combine Alibaba Cloud's MCP Server with large language models to enable natural‑language operations on cloud products, covering setup, tool selection, OAuth authentication, code examples, troubleshooting context‑length limits, and future enhancements for more efficient, secure cloud management.

MCPPythonapi-integration
0 likes · 20 min read
Control Alibaba Cloud Resources with LLMs and MCP Server in Minutes
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 23, 2025 · Artificial Intelligence

What Are AI Agents? Architecture, Applications, and Future Trends

AI Agents, autonomous intelligent programs that perceive, reason, and act, are reshaping industries from healthcare to autonomous driving; this article explains their core components, differences from large language models, planning techniques, memory mechanisms, tool use, real‑world applications, current challenges, and future directions.

AI AgentApplicationsAutonomous AI
0 likes · 35 min read
What Are AI Agents? Architecture, Applications, and Future Trends
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 19, 2025 · Artificial Intelligence

Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation

II-Agent is an open‑source, multi‑domain AI agent framework that leverages powerful large language models, a rich toolset, planning‑and‑reflection mechanisms, and advanced context management to enable autonomous task execution, real‑time interaction, and seamless integration across development, data analysis, and enterprise workflows.

AI Agentautomationcontext management
0 likes · 21 min read
Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation
ByteDance Data Platform
ByteDance Data Platform
Jun 18, 2025 · Artificial Intelligence

How Imperfect AI Can Unlock the Hidden 80% of Enterprise Data

Enterprises face a sharp paradox: despite exploding data volumes, only about 20% of structured data is used while the remaining 80% of unstructured data stays frozen, and this talk explores how Data Agent‑powered imperfect AI can awaken that hidden value.

AIData AgentEnterprise AI
0 likes · 16 min read
How Imperfect AI Can Unlock the Hidden 80% of Enterprise Data
JD Tech
JD Tech
Jun 16, 2025 · Artificial Intelligence

How JD Engineers Leverage LLMs and Sparse Models to Boost Search and Ads

This article showcases three JD tech case studies—using large language models for e‑commerce query expansion, applying sparse large models with scaling‑law experiments to improve ad prediction, and building proactive risk‑prevention systems—to illustrate practical AI engineering that drives higher recall, conversion, and system robustness.

Advertisinge‑commercelarge language model
0 likes · 8 min read
How JD Engineers Leverage LLMs and Sparse Models to Boost Search and Ads
TAL Education Technology
TAL Education Technology
Jun 13, 2025 · Operations

How Large Language Models Are Revolutionizing Fault Localization

This article explores how the rapid rise of large language models and techniques like Retrieval‑Augmented Generation, Chain‑of‑Thought prompting, and multi‑agent architectures can dramatically improve the speed, accuracy, and automation of fault localization in modern operations environments.

Agent ArchitectureCoTFault Localization
0 likes · 14 min read
How Large Language Models Are Revolutionizing Fault Localization
Nightwalker Tech
Nightwalker Tech
Jun 11, 2025 · Artificial Intelligence

Turn Your AI Coding Assistant into a Critical Mentor, Not Just a Tool

This guide explains how to shift AI coding tools like Cursor, Windsurf, and RooCode from simple code generators into proactive mentors that critique, suggest improvements, and adopt multiple specialized modes, while also covering prompt design, multi‑round dialogue, and practical code examples.

AICoding AssistantPrompt engineering
0 likes · 15 min read
Turn Your AI Coding Assistant into a Critical Mentor, Not Just a Tool
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI SafetyLoRAModel Pruning
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchMixture of ExpertsTraining Efficiency
0 likes · 10 min read
How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 5, 2025 · Artificial Intelligence

How Deep (Re)Search Transforms Code Search and AI-Powered Knowledge Retrieval

This article systematically explains the concepts of Deep Search and Deep Research, contrasts them with traditional Retrieval‑Augmented Generation, reviews leading commercial and open‑source solutions, details their architecture for code retrieval, and outlines future plans for specialized code‑search agents.

AI researchKnowledge RetrievalRetrieval Augmented Generation
0 likes · 13 min read
How Deep (Re)Search Transforms Code Search and AI-Powered Knowledge Retrieval
Java Web Project
Java Web Project
Jun 4, 2025 · Artificial Intelligence

Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge

The article analyzes DeepSeek's rapid adoption, detailing its seven core models, the third‑generation MoE architecture, FP8 mixed‑precision training, 128K context window, benchmark superiority on MMLU/HumanEval/CMMLU, low training cost, and fully open‑source release, while also introducing a companion guide for developers.

AI ArchitectureDeepSeekFP8 training
0 likes · 9 min read
Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge
Kuaishou Tech
Kuaishou Tech
Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchModel OptimizationReinforcement Learning
0 likes · 12 min read
KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning
AI Frontier Lectures
AI Frontier Lectures
May 30, 2025 · Artificial Intelligence

Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B

The Beijing University team unveils FairyR1‑32B, a 32‑billion‑parameter LLM built on DeepSeek‑R1‑Distill‑Qwen‑32B that uses self‑merging, multi‑teacher cross‑distillation, and lightweight distillation to achieve competitive math and code benchmark scores with only about 5% of the original model’s parameters.

Distillationlarge language modelmodel compression
0 likes · 6 min read
Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B
Efficient Ops
Efficient Ops
May 29, 2025 · Artificial Intelligence

DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3

DeepSeek quietly launched the R1 0528 model, which early testers report matches OpenAI’s o3 in benchmarks and style, while adding deeper chain‑of‑thought reasoning, better writing output, and extended thinking windows, and the announcement is followed by a promotion for the GOPS Global Ops Conference.

AI PerformanceChain-of-ThoughtDeepSeek
0 likes · 3 min read
DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3
IT Services Circle
IT Services Circle
May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekMixture of Experts
0 likes · 10 min read
DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview
Fun with Large Models
Fun with Large Models
May 25, 2025 · Artificial Intelligence

A Complete Breakdown of Claude 4’s Core Features – How Close Are We to Programmer Unemployment?

Claude 4, released in May 2025 with Opus and Sonnet variants, combines hybrid inference, a 200 K context window, advanced code interpreter, RAG retrieval and MCP integration, delivering industry‑leading programming and AI‑agent performance at relatively low cost, as confirmed by multiple company and user evaluations.

AI agentsAnthropicClaude 4
0 likes · 10 min read
A Complete Breakdown of Claude 4’s Core Features – How Close Are We to Programmer Unemployment?
JD Retail Technology
JD Retail Technology
May 22, 2025 · Industry Insights

Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained

This article recounts the journey of a JD PhD trainee who transformed academic research on anomaly detection into a production‑grade, LLM‑enhanced anti‑fraud system that identifies concealed address codes in CPS ads, detailing model design, LoRA fine‑tuning, reinforcement learning, distillation, cost‑aware deployment, and lessons learned for scalable ad risk management.

Reinforcement Learningad fraud detectionindustry AI
0 likes · 12 min read
Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained
DataFunSummit
DataFunSummit
May 17, 2025 · Artificial Intelligence

Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management

This presentation explores how combining knowledge graphs with DeepSeek large‑model agents can revolutionize enterprise knowledge management, detailing DeepSeek’s technical strengths, the graph‑model complementarity paradigm, various knowledge types, practical frameworks, case studies, and future outlooks for AI‑enhanced intelligent systems.

Artificial IntelligenceDeepSeekEnterprise Knowledge Management
0 likes · 23 min read
Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management
Alimama Tech
Alimama Tech
May 14, 2025 · Artificial Intelligence

Deep Research‑Driven Risk Root‑Cause Analysis with Domain Graph Constraints for Large‑Scale Advertising Traffic

This article presents a large‑scale advertising risk‑control solution that combines deep‑research paradigms, domain‑graph constraints, and large language models to enable explainable, responsible, and high‑precision fraud detection, detailing system architecture, challenges, demo workflow, and future directions.

AIDeep Researchadvertising fraud
0 likes · 11 min read
Deep Research‑Driven Risk Root‑Cause Analysis with Domain Graph Constraints for Large‑Scale Advertising Traffic
Alimama Tech
Alimama Tech
May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

AdvertisingMultimodalPrompt engineering
0 likes · 17 min read
Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising
DevOps
DevOps
May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekMathematical Reasoning
0 likes · 4 min read
DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
Architects' Tech Alliance
Architects' Tech Alliance
May 2, 2025 · Artificial Intelligence

DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving

DeepSeek‑Prover‑V2‑671B, a 671 billion‑parameter AI model released on Hugging Face, dramatically advances formal mathematical theorem proving with MoE architecture, FP8 quantization, 163 k token context, superior performance over GPT‑4 Turbo and other models, and broad implications for research and industry.

AIDeepSeekFP8 quantization
0 likes · 11 min read
DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving
JavaEdge
JavaEdge
May 2, 2025 · Artificial Intelligence

Exploring Qwen3: Open‑Source LLM Features, Benchmarks, and Deployment Guides

This article introduces the Qwen3 family of open‑source large language models, details their architecture, parameter counts, multilingual support, and benchmark performance, and provides step‑by‑step instructions for deploying them with frameworks like SGLang, vLLM, and local runtimes such as Ollama and LMStudio.

AIQwen3agent
0 likes · 22 min read
Exploring Qwen3: Open‑Source LLM Features, Benchmarks, and Deployment Guides
AI Algorithm Path
AI Algorithm Path
May 2, 2025 · Artificial Intelligence

Qwen3 Launch: Open-Source Models Redefine General AI

The Qwen3 series introduces eight open‑source large language models ranging from 0.6B to 235B parameters, combines dense and Mixture‑of‑Experts architectures, supports multimodal input, offers mixed inference modes, and demonstrates benchmark superiority over leading models such as OpenAI o1 and Gemini 2.5 Pro.

AI agentsMixture of ExpertsMultimodal
0 likes · 10 min read
Qwen3 Launch: Open-Source Models Redefine General AI
Mafengwo Technology
Mafengwo Technology
Apr 30, 2025 · Artificial Intelligence

How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy

The article details the development, training, and evaluation of MaFengWo's 32‑billion‑parameter travel large language model (mfw‑32B), highlighting its superior itinerary planning, personalized demand capture, budget management, and resource efficiency compared to DeepSeek‑R1, and describing the SFT and reinforcement‑learning stages that enabled these gains.

AI OptimizationLoRAModel Evaluation
0 likes · 14 min read
How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 29, 2025 · Artificial Intelligence

Unlock Qwen3: Powerful LLM Features and Zero‑Code Deployment on Alibaba Cloud

This article introduces Qwen3, the latest dense and MOE large language model with dual‑mode reasoning, enhanced inference, multilingual support, and strong agent capabilities, and explains how Alibaba Cloud's PAI‑Model Gallery enables zero‑code, one‑click deployment and enterprise‑grade usage.

Alibaba CloudQwen3Zero‑Code Deployment
0 likes · 6 min read
Unlock Qwen3: Powerful LLM Features and Zero‑Code Deployment on Alibaba Cloud
Programmer DD
Programmer DD
Apr 29, 2025 · Artificial Intelligence

Why Qwen3 Is Redefining Open‑Source LLMs: Mixed‑Inference Power and Unmatched Performance

Qwen3, Alibaba’s latest open‑source large language model, introduces a pioneering mixed‑inference architecture that blends top‑tier reasoning and non‑reasoning capabilities, delivering record‑breaking benchmark scores, multilingual support for 119 languages, cost‑effective deployment, and a 128K context window, now accessible via Ollama and OpenRouter.

AI BenchmarkQwen3large language model
0 likes · 5 min read
Why Qwen3 Is Redefining Open‑Source LLMs: Mixed‑Inference Power and Unmatched Performance
DataFunTalk
DataFunTalk
Apr 29, 2025 · Artificial Intelligence

ChatGPT Adds Shopping Feature and Alibaba Unveils Qwen3 Model Series

OpenAI announced new shopping capabilities for ChatGPT, improving product recommendation, visual presentation, and direct purchase links, while Alibaba released the Qwen3 series of large and MoE language models with detailed parameter counts and benchmark performance, highlighting rapid advancements in consumer‑focused AI applications.

AIArtificial IntelligenceChatGPT
0 likes · 4 min read
ChatGPT Adds Shopping Feature and Alibaba Unveils Qwen3 Model Series
Java Architecture Diary
Java Architecture Diary
Apr 29, 2025 · Artificial Intelligence

Why Qwen3 Is the New Powerhouse in Open‑Source AI Models

Qwen3 introduces a suite of open‑source models—from a 235B expert model to compact 0.6B versions—offering competitive performance against top proprietary models, multilingual support, flexible thinking modes, and low deployment requirements, with detailed usage instructions via Ollama and OpenRouter.

OllamaQwen3large language model
0 likes · 8 min read
Why Qwen3 Is the New Powerhouse in Open‑Source AI Models
Baidu Tech Salon
Baidu Tech Salon
Apr 28, 2025 · Artificial Intelligence

Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact

At the Create2025 AI Developer Conference, Baidu unveiled the multimodal Wenxin 4.5 Turbo and X1 Turbo models, detailing their innovative architecture, self‑feedback post‑training, composite reasoning chains, data pipelines, and the new Wenxin KuaiMa 3.5 code assistant, while also showcasing ecosystem growth and cultural AI applications.

AI ConferenceBaiduModel Optimization
0 likes · 9 min read
Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact
21CTO
21CTO
Apr 26, 2025 · Artificial Intelligence

Baidu Launches Low-Cost ERNIE 4.5 Turbo & X1 Turbo Multimodal AI Models

Baidu unveiled upgraded ERNIE 4.5 Turbo and ERNIE X1 Turbo models with enhanced multimodal abilities, lower costs and free access, while analysts debated the performance of its new P800 chip cluster and its strategic impact in the global AI race.

AI competitionBaiduErnie
0 likes · 5 min read
Baidu Launches Low-Cost ERNIE 4.5 Turbo & X1 Turbo Multimodal AI Models
Tencent Technical Engineering
Tencent Technical Engineering
Apr 22, 2025 · Artificial Intelligence

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.

EmbeddingMTEBcross-lingual retrieval
0 likes · 14 min read
Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB
dbaplus Community
dbaplus Community
Apr 21, 2025 · Operations

Turn Zabbix Alerts into AI‑Powered Insights with DeepSeek

This guide shows how to integrate Zabbix with a locally deployed DeepSeek large language model via Webhook, enabling automatic analysis of alerts, generation of root‑cause explanations and remediation suggestions, and delivering results through WeChat bots, dashboards, or email to reduce MTTR and manual effort.

AI OpsAlert AutomationDeepSeek
0 likes · 4 min read
Turn Zabbix Alerts into AI‑Powered Insights with DeepSeek
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Apr 17, 2025 · Artificial Intelligence

Inside Qwen: A Deep Dive into the Large Model’s Source Code

The article provides a comprehensive technical walkthrough of Qwen’s large‑model series, covering data preparation, tokenization, model tweaks, training settings, RLHF pipeline, Code‑Qwen specifics, Qwen2 and Qwen3 architectural changes, scaling‑law experiments, and detailed source‑code analysis with illustrative diagrams.

MoEModel architectureQwen
0 likes · 7 min read
Inside Qwen: A Deep Dive into the Large Model’s Source Code
21CTO
21CTO
Apr 17, 2025 · Artificial Intelligence

What’s New in OpenAI’s GPT‑4.1? Bigger Context, Faster, Cheaper AI

OpenAI has launched GPT‑4.1, a multimodal AI model that expands context windows to one million tokens, improves coding and instruction following, offers cheaper Mini and Nano variants, and signals a shift in its release roadmap, including plans to retire GPT‑4 and delay GPT‑5.

AI researchGPT-4.1OpenAI
0 likes · 5 min read
What’s New in OpenAI’s GPT‑4.1? Bigger Context, Faster, Cheaper AI
AIWalker
AIWalker
Apr 13, 2025 · Artificial Intelligence

Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs

Huawei's Pangu Ultra introduces a 135‑billion‑parameter dense language model trained entirely on Ascend NPUs, detailing novel stability architectures, a domain‑aware tokenizer, multi‑stage pre‑training, extensive system optimizations, and benchmark results that surpass leading models such as Llama 405B and DeepSeek‑R1.

Ascend NPUDense ModelSystem optimization
0 likes · 15 min read
Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs
AntTech
AntTech
Apr 11, 2025 · Artificial Intelligence

Understanding MCP and Function Call: A Comprehensive Guide to LLM Tool Integration

This article explains the MCP protocol and Function Call mechanism for large language models, detailing how tools are described, invoked, and processed, and provides practical code examples ranging from OpenAI JSON specifications to fast‑MCP Python and Spring MVC implementations.

AI tool integrationMCPPrompt engineering
0 likes · 14 min read
Understanding MCP and Function Call: A Comprehensive Guide to LLM Tool Integration
JD Tech Talk
JD Tech Talk
Apr 11, 2025 · Artificial Intelligence

A Billion-Scale Pure Time Series Large Model: PCTLM with SFT and TPO for Forecasting

This article presents a pioneering billion‑parameter pure time‑series large model (PCTLM) trained on a 1.5‑billion‑sample dataset, introduces a novel RLHF framework (TPO) for time‑series forecasting, and demonstrates state‑of‑the‑art performance across multiple public benchmarks, surpassing existing models such as GPT4TS.

PCTLMRLHFTPO
0 likes · 11 min read
A Billion-Scale Pure Time Series Large Model: PCTLM with SFT and TPO for Forecasting
Volcano Engine Developer Services
Volcano Engine Developer Services
Apr 8, 2025 · Artificial Intelligence

Which Cloud Platform Delivers the Fastest DeepSeek‑R1 API? A Comprehensive Benchmark

This article aggregates multiple independent evaluations of DeepSeek‑R1 across major cloud providers, comparing accuracy on AIME math problems, token‑per‑second throughput, first‑token latency, stability under high concurrency, and overall service reliability, ultimately highlighting Volcano Engine as the top performer.

AI inferenceAPI performanceDeepSeek
0 likes · 12 min read
Which Cloud Platform Delivers the Fastest DeepSeek‑R1 API? A Comprehensive Benchmark
DevOps
DevOps
Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchContext WindowLlama 4
0 likes · 8 min read
Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances
21CTO
21CTO
Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI SafetyLlama 4Mixture of Experts
0 likes · 14 min read
Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities
AI Algorithm Path
AI Algorithm Path
Apr 2, 2025 · Artificial Intelligence

Vision‑Reasoning Model: Enabling LLMs to See and Think

The article analyzes the limitations of current visual language models and large reasoning models, proposes a combined Vision‑Reasoning Model (VRM), details its architecture using LLaVA, describes end‑to‑end fine‑tuning and reinforcement‑learning reward design, and argues that such models will become the next breakthrough in AI.

DeepSeekLLaVAReinforcement Learning
0 likes · 9 min read
Vision‑Reasoning Model: Enabling LLMs to See and Think
Java Architect Essentials
Java Architect Essentials
Apr 2, 2025 · Backend Development

Integrating DeepSeek Large Language Model with Spring Boot to Build an AI Chat Application

This guide demonstrates how to create a Spring Boot backend that integrates DeepSeek's large language model via the Spring AI OpenAI starter, covering project setup, dependency configuration, API key management, and a sample controller that provides AI-powered chat responses such as weather forecasts.

AI integrationChatbotDeepSeek
0 likes · 8 min read
Integrating DeepSeek Large Language Model with Spring Boot to Build an AI Chat Application
Nightwalker Tech
Nightwalker Tech
Apr 1, 2025 · Artificial Intelligence

Evaluation of AutoGLM: Features, Architecture, and Practical Test Results

This article reviews AutoGLM, the first "think‑while‑doing" AI agent released by Zhipu AI, detailing its core capabilities, full‑stack architecture, user experience, identified limitations, and the outcomes of three hands‑on tests using both the client application and a Chrome extension.

AI AgentAutoGLMMultimodal
0 likes · 4 min read
Evaluation of AutoGLM: Features, Architecture, and Practical Test Results
DaTaobao Tech
DaTaobao Tech
Mar 31, 2025 · Artificial Intelligence

AI Audio Generation and Voice Synthesis Practices at Taobao

The article surveys Taobao’s AI‑generated audio pipeline, detailing eight technical papers on image‑to‑video, OpenAI o1, multimodal video, and large‑model voice synthesis, while highlighting advances like VALL‑E, CosyVoice, F5‑TTS, data‑cleaning methods, and e‑commerce applications such as voice‑cloned live streams, multilingual TTS, AI video‑audio integration, and audiobook production.

AI audioTTSdata cleaning
0 likes · 11 min read
AI Audio Generation and Voice Synthesis Practices at Taobao
AI Frontier Lectures
AI Frontier Lectures
Mar 31, 2025 · Artificial Intelligence

How Anthropic’s Path Tracing Reveals the Inner Workings of Claude 3.5 Haiku

Anthropic’s recent paper introduces a path‑tracing technique that uses cross‑layer transcoders and attribution graphs to sparsely visualize and analyze the decision‑making process of the Claude 3.5 Haiku large language model, demonstrating Pareto‑optimal improvements and a four‑stage reverse‑engineering framework while acknowledging current limitations.

AnthropicAttribution GraphClaude 3.5
0 likes · 14 min read
How Anthropic’s Path Tracing Reveals the Inner Workings of Claude 3.5 Haiku
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 31, 2025 · Artificial Intelligence

Unlock AI-Powered Data Processing with MaxFrame’s AI Function

This article introduces MaxFrame’s AI Function, a new feature built on MaxCompute that integrates large language models like Qwen 2.5 and DeepSeek‑R1‑Distill‑Qwen to simplify model deployment and enable scalable text classification, information extraction, summarization, translation, and other AI-driven data processing tasks on massive datasets.

AI FunctionMaxComputeMaxFrame
0 likes · 19 min read
Unlock AI-Powered Data Processing with MaxFrame’s AI Function
Architects' Tech Alliance
Architects' Tech Alliance
Mar 28, 2025 · Artificial Intelligence

How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency

The report analyzes DeepSeek's latest V3 and R1 models, highlights their scaling‑law‑driven cost reductions, explains how Huawei Ascend optimizes inference by cutting KV‑Cache storage and improving compute efficiency, and surveys the model’s deployments across finance, government, manufacturing, and healthcare sectors.

AI efficiencyAI inferenceDeepSeek
0 likes · 4 min read
How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency
21CTO
21CTO
Mar 27, 2025 · Artificial Intelligence

Google Unveils Gemini 2.5: The Most Advanced Reasoning AI Yet

Google's Gemini 2.5, billed as its most intelligent AI model, introduces advanced reasoning capabilities that outperform rivals on benchmarks like LMArena and Humanity's Last Exam, excels at web and agent code generation, and is now available to premium users via AI Studio with a 1‑million token context window.

AI reasoningGoogle Geminibenchmark performance
0 likes · 4 min read
Google Unveils Gemini 2.5: The Most Advanced Reasoning AI Yet
Sohu Tech Products
Sohu Tech Products
Mar 26, 2025 · Artificial Intelligence

How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding

SpatialLM is a large language model designed for 3D spatial understanding that converts point‑cloud data from videos, RGB‑D images or LiDAR into structured scene descriptions, and this guide explains its architecture, model versions, repository links, and step‑by‑step deployment on Ubuntu with PyTorch.

3D point cloudMultimodal AIPyTorch
0 likes · 7 min read
How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding
MaGe Linux Operations
MaGe Linux Operations
Mar 26, 2025 · Artificial Intelligence

Why Qwen2.5‑VL‑32B Is the New AI Breakthrough for Vision and Math

Alibaba's newly released Qwen2.5‑VL‑32B multimodal model delivers state‑of‑the‑art visual and textual performance, offering human‑aligned responses, superior mathematical reasoning, fine‑grained image understanding, and efficient deployment features that make it a compelling tool for developers and AI researchers alike.

AI researchQwen2.5-VL-32Blarge language model
0 likes · 9 min read
Why Qwen2.5‑VL‑32B Is the New AI Breakthrough for Vision and Math
21CTO
21CTO
Mar 25, 2025 · Artificial Intelligence

Which LLM Is Best for Coding? Speed, Hallucination, and Context Compared

This article breaks down major large language models, defining key comparison metrics such as speed, hallucination rate, and context window, then evaluates each model with benchmarks like HumanEval+, ChatBot Arena, and Aider to help you choose the most suitable LLM for your coding tasks.

AILLMbenchmark
0 likes · 10 min read
Which LLM Is Best for Coding? Speed, Hallucination, and Context Compared
Cognitive Technology Team
Cognitive Technology Team
Mar 22, 2025 · Artificial Intelligence

Three Stages of Developing Large Language Models and Practical Guidance

The article outlines the three development phases of large language models—building, pre‑training, and fine‑tuning—describes usage options, highlights key factors such as data scale, architecture, training processes, and evaluation, and offers practical advice for cost‑effective development.

Fine-tuningLLMModel Development
0 likes · 3 min read
Three Stages of Developing Large Language Models and Practical Guidance
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Mar 19, 2025 · Artificial Intelligence

Choosing the Right Deployment Strategy for Large Language Models: QwQ‑32B vs DeepSeek‑R1

This article compares QwQ‑32B and DeepSeek‑R1 large language models across performance, technical breakthroughs, deployment costs, and open‑source ecosystems, then evaluates pure‑local, hybrid, and pure‑cloud deployment options, and finally provides practical guidelines for preparing knowledge‑base documents and indexing methods.

AIDeploymentKnowledge Base
0 likes · 10 min read
Choosing the Right Deployment Strategy for Large Language Models: QwQ‑32B vs DeepSeek‑R1
JD Tech
JD Tech
Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail
0 likes · 20 min read
JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications
Baidu Geek Talk
Baidu Geek Talk
Mar 19, 2025 · Artificial Intelligence

Inside Baidu’s New Wenxin 4.5 & X1: Multimodal Breakthroughs and Tool‑Enabled AI

Baidu officially launched the Wenxin 4.5 and X1 large language models, showcasing native multimodal foundations, advanced attention masks, heterogeneous expert extensions, and tool‑calling capabilities, while offering low‑cost API access on the Qianfan platform and outlining the underlying technical innovations that drive their performance gains.

AI PlatformBaiduMultimodal AI
0 likes · 8 min read
Inside Baidu’s New Wenxin 4.5 & X1: Multimodal Breakthroughs and Tool‑Enabled AI
Code Mala Tang
Code Mala Tang
Mar 15, 2025 · Artificial Intelligence

What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?

Google’s Gemma 3, a lightweight open‑source model with up to 27 billion parameters, offers multimodal input, 128K token context, and broad language support, outperforming leading rivals on single‑GPU benchmarks and providing flexible deployment options for developers and researchers alike.

AI modelGemma 3Google AI
0 likes · 9 min read
What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?
Architects' Tech Alliance
Architects' Tech Alliance
Mar 10, 2025 · Industry Insights

How AI Agents Are Redefining the Future of Intelligent Computing

This article provides a comprehensive analysis of AI agents, covering their historical origins, three‑layer technology stack, market size forecasts, evolution from training to inference, interaction modes, core modules, and the full industry chain from infrastructure providers to downstream applications.

AI AgentAI MarketAgent Architecture
0 likes · 13 min read
How AI Agents Are Redefining the Future of Intelligent Computing
CSS Magic
CSS Magic
Mar 10, 2025 · Artificial Intelligence

Three Advanced Ways to Harness DeepSeek for Everyone

The article outlines three practical approaches to get the most out of DeepSeek—using it as a conversational assistant, integrating its API to power AI tools such as the Chrome immersive‑translation plugin, and leveraging it for AI‑assisted programming—while comparing the V3 and R1 models and offering concrete configuration steps.

AI programmingAI translationChrome Extension
0 likes · 8 min read
Three Advanced Ways to Harness DeepSeek for Everyone
Top Architect
Top Architect
Mar 9, 2025 · Artificial Intelligence

Alibaba Unveils Qwen QwQ-32B: A Compact Open‑Source LLM Rivaling DeepSeek

Alibaba has released the open‑source Qwen QwQ‑32B model, a 32‑billion‑parameter LLM that matches DeepSeek‑R1's performance while being deployable on consumer‑grade GPUs, and the announcement is accompanied by extensive promotional offers for AI‑related products and services.

AI BenchmarkAlibabaQwen
0 likes · 7 min read
Alibaba Unveils Qwen QwQ-32B: A Compact Open‑Source LLM Rivaling DeepSeek
ZhongAn Tech Team
ZhongAn Tech Team
Mar 8, 2025 · Artificial Intelligence

Weekly AI Rumors Issue 15: Manus AI Agent Launch, GPT‑4.5 Evaluation, and LightThinker Technique

This issue reviews the hype around China’s Manus AI Agent and its invitation‑code controversy, critiques OpenAI’s GPT‑4.5 performance versus DeepSeek, showcases industry solutions using AI agents, and introduces the LightThinker method for dynamically compressing LLM inference chains to boost efficiency.

AI AgentAI MarketGPT-4.5
0 likes · 15 min read
Weekly AI Rumors Issue 15: Manus AI Agent Launch, GPT‑4.5 Evaluation, and LightThinker Technique
Java Tech Enthusiast
Java Tech Enthusiast
Mar 8, 2025 · Artificial Intelligence

QwQ-32B Large Language Model Overview and Performance

Alibaba’s new QwQ‑32B large‑language model, with 32 billion parameters, delivers performance comparable to or surpassing the 671‑billion‑parameter DeepSeek‑R1 across math, coding, and general benchmarks, and is available via HuggingFace, ModelScope, and a DashScope API demo with example Python code.

AI BenchmarkPython APIlarge language model
0 likes · 5 min read
QwQ-32B Large Language Model Overview and Performance
Java Architecture Diary
Java Architecture Diary
Mar 7, 2025 · Artificial Intelligence

Boost Inference Efficiency with QwQ-32B: Benchmarks, Resource Savings, and Java Integration

QwQ-32B, Alibaba’s new inference‑optimized large language model built on the Qwen2.5 architecture, outperforms DeepSeek‑R1 across math reasoning, code generation, and safety benchmarks while requiring only 24 GB vRAM, and the article provides detailed performance data, resource‑efficiency analysis, and step‑by‑step Java and Ollama integration instructions.

Function CallingInference OptimizationJava integration
0 likes · 7 min read
Boost Inference Efficiency with QwQ-32B: Benchmarks, Resource Savings, and Java Integration
AI Product Manager Community
AI Product Manager Community
Mar 6, 2025 · Artificial Intelligence

Why Alibaba’s QwQ‑32B Rivals 670B Models with Just 32B Parameters

Alibaba’s newly released 32‑billion‑parameter QwQ‑32B model matches the performance of 670‑billion‑parameter rivals like DeepSeek‑R1, integrates agent‑based reasoning, runs on consumer hardware, and has sparked strong open‑source community adoption, as shown by benchmark results and download statistics.

AlibabaQwenagent
0 likes · 6 min read
Why Alibaba’s QwQ‑32B Rivals 670B Models with Just 32B Parameters
Programmer DD
Programmer DD
Mar 6, 2025 · Artificial Intelligence

Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance

The QwQ-32B model, released by Alibaba Cloud, delivers DeepSeek‑R1‑level results with only 32 billion parameters, offers integrated agent capabilities, is open‑source under Apache 2.0, and can be quickly deployed locally via Ollama or integrated into Java applications using Spring AI.

AI inferenceModel DeploymentOllama
0 likes · 4 min read
Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 6, 2025 · Artificial Intelligence

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

AlibabaReinforcement LearningTransformer
0 likes · 4 min read
Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceLogic ReasoningRLHF
0 likes · 11 min read
Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles
Open Source Linux
Open Source Linux
Mar 5, 2025 · Artificial Intelligence

How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment

The article analyzes DeepSeek‑R1’s low‑cost inference architecture, Chinese language optimizations, novel prompt‑engineering techniques, and the practical challenges of deploying large domestic models, offering insights into vertical AI applications and the evolving open‑source ecosystem in China.

AI deploymentDeepSeekModel Optimization
0 likes · 8 min read
How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment
Architects' Tech Alliance
Architects' Tech Alliance
Feb 28, 2025 · Artificial Intelligence

DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1

The article analyzes DeepSeek’s latest V3 conversational model and R1 inference model, detailing their MoE architecture, training on H800 GPUs costing about $558 k, comparing compute expenses to Meta’s Llama 3.1, and showing that their API pricing is roughly one‑tenth of GPT‑4o for dialogue and one‑twentieth of OpenAI o1 for inference.

AI model analysisDeepSeekinference pricing
0 likes · 4 min read
DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1
IT Architects Alliance
IT Architects Alliance
Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8MLA
0 likes · 18 min read
DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies
Tencent Technical Engineering
Tencent Technical Engineering
Feb 26, 2025 · Artificial Intelligence

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Thirteen engineers praise DeepSeek’s open‑source, reinforcement‑learning‑driven architecture—using FP8 storage and SFT‑free training—to deliver GPT‑4‑level reasoning at one‑twentieth the cost, enabling single‑GPU deployment, lowering barriers for academia and startups, and prompting notable market reactions that could democratize advanced AI.

AI cost reductionDeepSeekFP8
0 likes · 9 min read
Engineers' Perspectives on DeepSeek: Technical Innovations and Implications
Architects' Tech Alliance
Architects' Tech Alliance
Feb 25, 2025 · Artificial Intelligence

What Makes DeepSeek‑R1 a Game‑Changer in AIGC? Insights from Peking University

This article summarizes a Peking University lecture on DeepSeek‑R1, detailing its core concepts, advantages, and historical significance, then explains the underlying mechanisms of large‑model AI and AIGC tools, and finally offers practical guidance for selecting and efficiently applying AI solutions.

AI model analysisAIGCDeepSeek
0 likes · 5 min read
What Makes DeepSeek‑R1 a Game‑Changer in AIGC? Insights from Peking University
Ma Wei Says
Ma Wei Says
Feb 25, 2025 · Artificial Intelligence

What Is GraphRAG? A Deep Dive into Next‑Gen Retrieval‑Augmented Generation and Open‑Source Implementations

GraphRAG, the next generation of Retrieval‑Augmented Generation, combines large language models, knowledge graphs, and graph databases to overcome traditional RAG’s knowledge gaps, hallucinations, and context limitations, and the article reviews its architecture, core modules, a recent 2025 paper, and six notable open‑source implementations.

Artificial IntelligenceGraphRAGRetrieval Augmented Generation
0 likes · 9 min read
What Is GraphRAG? A Deep Dive into Next‑Gen Retrieval‑Augmented Generation and Open‑Source Implementations
AI Algorithm Path
AI Algorithm Path
Feb 22, 2025 · Artificial Intelligence

Elon Musk Unveils Grok 3, Claiming the World’s Most Powerful AI Model

The article details the launch of Grok 3 by Elon Musk’s xAI, highlighting its massive GPU infrastructure, benchmark dominance over GPT‑4o, multiple model variants, pricing for Premium+ users, upcoming API and voice features, and the team’s plan to open‑source Grok 2 once the new model stabilises.

AI BenchmarkAI pricingElon Musk
0 likes · 6 min read
Elon Musk Unveils Grok 3, Claiming the World’s Most Powerful AI Model
Top Architect
Top Architect
Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

AIDeepSeekDynamic Quantization
0 likes · 16 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
Practical DevOps Architecture
Practical DevOps Architecture
Feb 20, 2025 · Artificial Intelligence

Training MiniDeepSeek V3+R1 from Scratch: Full-Scale Large Model Technical Practice for 2025

This tutorial series provides a step‑by‑step technical guide to training, deploying, and fine‑tuning the MiniDeepSeek V3+R1 large language model, covering model performance, open‑source details, API usage, parameter explanation, multi‑turn chatbot construction, function calling, integration with Open WebUI, GraphRAG, Swarm, and various deployment and optimization techniques.

AIMiniDeepSeekTraining
0 likes · 4 min read
Training MiniDeepSeek V3+R1 from Scratch: Full-Scale Large Model Technical Practice for 2025
Tencent Technical Engineering
Tencent Technical Engineering
Feb 19, 2025 · Artificial Intelligence

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.

DeepSeek-R1Reinforcement Learninglarge language model
0 likes · 39 min read
Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments