Tagged articles

Large Language Model

737 articles · Page 5 of 8

Jun 16, 2025 · Artificial Intelligence

How JD Engineers Leverage LLMs and Sparse Models to Boost Search and Ads

This article showcases three JD tech case studies—using large language models for e‑commerce query expansion, applying sparse large models with scaling‑law experiments to improve ad prediction, and building proactive risk‑prevention systems—to illustrate practical AI engineering that drives higher recall, conversion, and system robustness.

AdvertisingLarge Language ModelQuery Expansion

0 likes · 8 min read

How JD Engineers Leverage LLMs and Sparse Models to Boost Search and Ads

TAL Education Technology

Jun 13, 2025 · Operations

How Large Language Models Are Revolutionizing Fault Localization

This article explores how the rapid rise of large language models and techniques like Retrieval‑Augmented Generation, Chain‑of‑Thought prompting, and multi‑agent architectures can dramatically improve the speed, accuracy, and automation of fault localization in modern operations environments.

CoTFault LocalizationLarge Language Model

0 likes · 14 min read

How Large Language Models Are Revolutionizing Fault Localization

Baidu Tech Salon

Jun 11, 2025 · Artificial Intelligence

Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation

IDC’s 2025 China foundational large‑model evaluation crowns Baidu’s Wenxin as the top performer, scoring perfect marks in seven of eight criteria and highlighting its superior multimodal, dialogue, and ecosystem capabilities among twelve leading models.

AIBaidu WenxinIDC evaluation

0 likes · 5 min read

Why Baidu’s Wenxin Model Dominates IDC’s 2025 Large Model Evaluation

Nightwalker Tech

Jun 11, 2025 · Artificial Intelligence

Turn Your AI Coding Assistant into a Critical Mentor, Not Just a Tool

This guide explains how to shift AI coding tools like Cursor, Windsurf, and RooCode from simple code generators into proactive mentors that critique, suggest improvements, and adopt multiple specialized modes, while also covering prompt design, multi‑round dialogue, and practical code examples.

AILarge Language Modelcoding assistant

0 likes · 15 min read

Turn Your AI Coding Assistant into a Critical Mentor, Not Just a Tool

DataFunSummit

Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLarge Language ModelLoRA

0 likes · 13 min read

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Xiaohongshu Tech REDtech

Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchLarge Language ModelMixture of Experts

0 likes · 10 min read

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

Alibaba Cloud Developer

Jun 5, 2025 · Artificial Intelligence

How Deep (Re)Search Transforms Code Search and AI-Powered Knowledge Retrieval

This article systematically explains the concepts of Deep Search and Deep Research, contrasts them with traditional Retrieval‑Augmented Generation, reviews leading commercial and open‑source solutions, details their architecture for code retrieval, and outlines future plans for specialized code‑search agents.

AI researchCode searchLarge Language Model

0 likes · 13 min read

How Deep (Re)Search Transforms Code Search and AI-Powered Knowledge Retrieval

Java Web Project

Jun 4, 2025 · Artificial Intelligence

Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge

The article analyzes DeepSeek's rapid adoption, detailing its seven core models, the third‑generation MoE architecture, FP8 mixed‑precision training, 128K context window, benchmark superiority on MMLU/HumanEval/CMMLU, low training cost, and fully open‑source release, while also introducing a companion guide for developers.

AI ArchitectureDeepSeekFP8 training

0 likes · 9 min read

Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge

Kuaishou Tech

Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchLarge Language ModelModel Optimization

0 likes · 12 min read

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

Satori Komeiji's Programming Classroom

Jun 3, 2025 · Artificial Intelligence

Everything You Need to Know About Retrieval‑Augmented Generation (RAG)

The article explains Retrieval‑Augmented Generation (RAG) by describing how a programmer, frustrated with oversized prompts for a large language model, discovers that retrieving relevant document fragments, embedding them, and feeding the augmented context to the model yields accurate, fact‑based answers.

AIChunkingEmbedding

0 likes · 6 min read

Everything You Need to Know About Retrieval‑Augmented Generation (RAG)

AI Frontier Lectures

May 30, 2025 · Artificial Intelligence

Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B

The Beijing University team unveils FairyR1‑32B, a 32‑billion‑parameter LLM built on DeepSeek‑R1‑Distill‑Qwen‑32B that uses self‑merging, multi‑teacher cross‑distillation, and lightweight distillation to achieve competitive math and code benchmark scores with only about 5% of the original model’s parameters.

DistillationLarge Language ModelModel Compression

0 likes · 6 min read

Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B

Efficient Ops

May 29, 2025 · Artificial Intelligence

DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3

DeepSeek quietly launched the R1 0528 model, which early testers report matches OpenAI’s o3 in benchmarks and style, while adding deeper chain‑of‑thought reasoning, better writing output, and extended thinking windows, and the announcement is followed by a promotion for the GOPS Global Ops Conference.

AI performanceDeepSeekLarge Language Model

0 likes · 3 min read

DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3

Network Intelligence Research Center (NIRC)

May 27, 2025 · Artificial Intelligence

Simplify Large‑Model Fine‑Tuning with LLaMA‑Factory

This article walks through using LLaMA‑Factory—a unified framework that supports over 100 LLMs—to install dependencies, prepare Alpaca‑style datasets, perform LoRA fine‑tuning, run inference, and export the tuned model, all with concrete command‑line examples.

GitHubLLaMA-FactoryLarge Language Model

0 likes · 6 min read

Simplify Large‑Model Fine‑Tuning with LLaMA‑Factory

IT Services Circle

May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekLarge Language Model

0 likes · 10 min read

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

Fun with Large Models

May 25, 2025 · Artificial Intelligence

A Complete Breakdown of Claude 4’s Core Features – How Close Are We to Programmer Unemployment?

Claude 4, released in May 2025 with Opus and Sonnet variants, combines hybrid inference, a 200 K context window, advanced code interpreter, RAG retrieval and MCP integration, delivering industry‑leading programming and AI‑agent performance at relatively low cost, as confirmed by multiple company and user evaluations.

AI AgentsAnthropicClaude 4

0 likes · 10 min read

A Complete Breakdown of Claude 4’s Core Features – How Close Are We to Programmer Unemployment?

JD Retail Technology

May 22, 2025 · Industry Insights

Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained

This article recounts the journey of a JD PhD trainee who transformed academic research on anomaly detection into a production‑grade, LLM‑enhanced anti‑fraud system that identifies concealed address codes in CPS ads, detailing model design, LoRA fine‑tuning, reinforcement learning, distillation, cost‑aware deployment, and lessons learned for scalable ad risk management.

Large Language Modelad fraud detectionindustry AI

0 likes · 12 min read

Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained

DataFunSummit

May 17, 2025 · Artificial Intelligence

Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management

This presentation explores how combining knowledge graphs with DeepSeek large‑model agents can revolutionize enterprise knowledge management, detailing DeepSeek’s technical strengths, the graph‑model complementarity paradigm, various knowledge types, practical frameworks, case studies, and future outlooks for AI‑enhanced intelligent systems.

Artificial IntelligenceDeepSeekEnterprise Knowledge Management

0 likes · 23 min read

Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management

Architecture and Beyond

May 16, 2025 · Artificial Intelligence

Understanding AI Hallucinations: The Fictional Reality of Large Language Models

The essay explores why AI systems produce hallucinations by viewing their reality as a vast fictional narrative built from human language data, arguing that their knowledge is bounded by the corpus they ingest, and reflecting on philosophical limits of language and truth.

AIHallucinationLarge Language Model

0 likes · 11 min read

Understanding AI Hallucinations: The Fictional Reality of Large Language Models

Alimama Tech

May 14, 2025 · Artificial Intelligence

Deep Research‑Driven Risk Root‑Cause Analysis with Domain Graph Constraints for Large‑Scale Advertising Traffic

This article presents a large‑scale advertising risk‑control solution that combines deep‑research paradigms, domain‑graph constraints, and large language models to enable explainable, responsible, and high‑precision fraud detection, detailing system architecture, challenges, demo workflow, and future directions.

AIDeep ResearchLarge Language Model

0 likes · 11 min read

Deep Research‑Driven Risk Root‑Cause Analysis with Domain Graph Constraints for Large‑Scale Advertising Traffic

Alimama Tech

May 12, 2025 · Artificial Intelligence

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

The article presents the Universal Recommendation Model (URM), a large‑language‑model‑based recall framework that integrates world knowledge and e‑commerce expertise through knowledge injection and prompt‑driven alignment, achieving significant offline recall gains and a 3.1% increase in ad consumption while meeting high‑QPS, low‑latency production constraints.

AdvertisingLarge Language Modelhigh QPS

0 likes · 17 min read

Universal Recommendation Model (URM): A General Large‑Model Recall System for Advertising

DevOps

May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekLarge Language Model

0 likes · 4 min read

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

Architects' Tech Alliance

May 2, 2025 · Artificial Intelligence

DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving

DeepSeek‑Prover‑V2‑671B, a 671 billion‑parameter AI model released on Hugging Face, dramatically advances formal mathematical theorem proving with MoE architecture, FP8 quantization, 163 k token context, superior performance over GPT‑4 Turbo and other models, and broad implications for research and industry.

AIDeepSeekFP8 quantization

0 likes · 11 min read

DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving

JavaEdge

May 2, 2025 · Artificial Intelligence

Exploring Qwen3: Open‑Source LLM Features, Benchmarks, and Deployment Guides

This article introduces the Qwen3 family of open‑source large language models, details their architecture, parameter counts, multilingual support, and benchmark performance, and provides step‑by‑step instructions for deploying them with frameworks like SGLang, vLLM, and local runtimes such as Ollama and LMStudio.

AIAgentLarge Language Model

0 likes · 22 min read

Exploring Qwen3: Open‑Source LLM Features, Benchmarks, and Deployment Guides

AI Algorithm Path

May 2, 2025 · Artificial Intelligence

Qwen3 Launch: Open-Source Models Redefine General AI

The Qwen3 series introduces eight open‑source large language models ranging from 0.6B to 235B parameters, combines dense and Mixture‑of‑Experts architectures, supports multimodal input, offers mixed inference modes, and demonstrates benchmark superiority over leading models such as OpenAI o1 and Gemini 2.5 Pro.

AI AgentsBenchmarkLarge Language Model

0 likes · 10 min read

Qwen3 Launch: Open-Source Models Redefine General AI

Mafengwo Technology

Apr 30, 2025 · Artificial Intelligence

How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy

The article details the development, training, and evaluation of MaFengWo's 32‑billion‑parameter travel large language model (mfw‑32B), highlighting its superior itinerary planning, personalized demand capture, budget management, and resource efficiency compared to DeepSeek‑R1, and describing the SFT and reinforcement‑learning stages that enabled these gains.

Large Language ModelLoRAai-optimization

0 likes · 14 min read

How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy

Alibaba Cloud Big Data AI Platform

Apr 29, 2025 · Artificial Intelligence

Unlock Qwen3: Powerful LLM Features and Zero‑Code Deployment on Alibaba Cloud

This article introduces Qwen3, the latest dense and MOE large language model with dual‑mode reasoning, enhanced inference, multilingual support, and strong agent capabilities, and explains how Alibaba Cloud's PAI‑Model Gallery enables zero‑code, one‑click deployment and enterprise‑grade usage.

Alibaba CloudLarge Language ModelQwen3

0 likes · 6 min read

Unlock Qwen3: Powerful LLM Features and Zero‑Code Deployment on Alibaba Cloud

Programmer DD

Apr 29, 2025 · Artificial Intelligence

Why Qwen3 Is Redefining Open‑Source LLMs: Mixed‑Inference Power and Unmatched Performance

Qwen3, Alibaba’s latest open‑source large language model, introduces a pioneering mixed‑inference architecture that blends top‑tier reasoning and non‑reasoning capabilities, delivering record‑breaking benchmark scores, multilingual support for 119 languages, cost‑effective deployment, and a 128K context window, now accessible via Ollama and OpenRouter.

AI benchmarkLarge Language ModelQwen3

0 likes · 5 min read

Why Qwen3 Is Redefining Open‑Source LLMs: Mixed‑Inference Power and Unmatched Performance

DataFunTalk

Apr 29, 2025 · Artificial Intelligence

ChatGPT Adds Shopping Feature and Alibaba Unveils Qwen3 Model Series

OpenAI announced new shopping capabilities for ChatGPT, improving product recommendation, visual presentation, and direct purchase links, while Alibaba released the Qwen3 series of large and MoE language models with detailed parameter counts and benchmark performance, highlighting rapid advancements in consumer‑focused AI applications.

AIArtificial IntelligenceChatGPT

0 likes · 4 min read

ChatGPT Adds Shopping Feature and Alibaba Unveils Qwen3 Model Series

Java Architecture Diary

Apr 29, 2025 · Artificial Intelligence

Why Qwen3 Is the New Powerhouse in Open‑Source AI Models

Qwen3 introduces a suite of open‑source models—from a 235B expert model to compact 0.6B versions—offering competitive performance against top proprietary models, multilingual support, flexible thinking modes, and low deployment requirements, with detailed usage instructions via Ollama and OpenRouter.

Large Language ModelOllamaOpen-source AI

0 likes · 8 min read

Why Qwen3 Is the New Powerhouse in Open‑Source AI Models

Baidu Tech Salon

Apr 28, 2025 · Artificial Intelligence

Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact

At the Create2025 AI Developer Conference, Baidu unveiled the multimodal Wenxin 4.5 Turbo and X1 Turbo models, detailing their innovative architecture, self‑feedback post‑training, composite reasoning chains, data pipelines, and the new Wenxin KuaiMa 3.5 code assistant, while also showcasing ecosystem growth and cultural AI applications.

AI ConferenceBaiduLarge Language Model

0 likes · 9 min read

Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact

21CTO

Apr 26, 2025 · Artificial Intelligence

Baidu Launches Low-Cost ERNIE 4.5 Turbo & X1 Turbo Multimodal AI Models

Baidu unveiled upgraded ERNIE 4.5 Turbo and ERNIE X1 Turbo models with enhanced multimodal abilities, lower costs and free access, while analysts debated the performance of its new P800 chip cluster and its strategic impact in the global AI race.

AI competitionBaiduERNIE

0 likes · 5 min read

Baidu Launches Low-Cost ERNIE 4.5 Turbo & X1 Turbo Multimodal AI Models

Tencent Technical Engineering

Apr 22, 2025 · Artificial Intelligence

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.

EmbeddingLarge Language ModelMTEB

0 likes · 14 min read

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

dbaplus Community

Apr 21, 2025 · Operations

Turn Zabbix Alerts into AI‑Powered Insights with DeepSeek

This guide shows how to integrate Zabbix with a locally deployed DeepSeek large language model via Webhook, enabling automatic analysis of alerts, generation of root‑cause explanations and remediation suggestions, and delivering results through WeChat bots, dashboards, or email to reduce MTTR and manual effort.

AI OpsAlert AutomationDeepSeek

0 likes · 4 min read

Turn Zabbix Alerts into AI‑Powered Insights with DeepSeek

AI2ML AI to Machine Learning

Apr 17, 2025 · Artificial Intelligence

Inside Qwen: A Deep Dive into the Large Model’s Source Code

The article provides a comprehensive technical walkthrough of Qwen’s large‑model series, covering data preparation, tokenization, model tweaks, training settings, RLHF pipeline, Code‑Qwen specifics, Qwen2 and Qwen3 architectural changes, scaling‑law experiments, and detailed source‑code analysis with illustrative diagrams.

Large Language ModelMoEQwen

0 likes · 7 min read

Inside Qwen: A Deep Dive into the Large Model’s Source Code

21CTO

Apr 17, 2025 · Artificial Intelligence

What’s New in OpenAI’s GPT‑4.1? Bigger Context, Faster, Cheaper AI

OpenAI has launched GPT‑4.1, a multimodal AI model that expands context windows to one million tokens, improves coding and instruction following, offers cheaper Mini and Nano variants, and signals a shift in its release roadmap, including plans to retire GPT‑4 and delay GPT‑5.

AI researchGPT-4.1Large Language Model

0 likes · 5 min read

What’s New in OpenAI’s GPT‑4.1? Bigger Context, Faster, Cheaper AI

AIWalker

Apr 13, 2025 · Artificial Intelligence

Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs

Huawei's Pangu Ultra introduces a 135‑billion‑parameter dense language model trained entirely on Ascend NPUs, detailing novel stability architectures, a domain‑aware tokenizer, multi‑stage pre‑training, extensive system optimizations, and benchmark results that surpass leading models such as Llama 405B and DeepSeek‑R1.

Ascend NPUDense ModelLarge Language Model

0 likes · 15 min read

Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs

AntTech

Apr 11, 2025 · Artificial Intelligence

Understanding MCP and Function Call: A Comprehensive Guide to LLM Tool Integration

This article explains the MCP protocol and Function Call mechanism for large language models, detailing how tools are described, invoked, and processed, and provides practical code examples ranging from OpenAI JSON specifications to fast‑MCP Python and Spring MVC implementations.

AI tool integrationLarge Language ModelMCP

0 likes · 14 min read

Understanding MCP and Function Call: A Comprehensive Guide to LLM Tool Integration

JD Tech Talk

Apr 11, 2025 · Artificial Intelligence

A Billion-Scale Pure Time Series Large Model: PCTLM with SFT and TPO for Forecasting

This article presents a pioneering billion‑parameter pure time‑series large model (PCTLM) trained on a 1.5‑billion‑sample dataset, introduces a novel RLHF framework (TPO) for time‑series forecasting, and demonstrates state‑of‑the‑art performance across multiple public benchmarks, surpassing existing models such as GPT4TS.

Large Language ModelPCTLMRLHF

0 likes · 11 min read

A Billion-Scale Pure Time Series Large Model: PCTLM with SFT and TPO for Forecasting

Volcano Engine Developer Services

Apr 8, 2025 · Artificial Intelligence

Which Cloud Platform Delivers the Fastest DeepSeek‑R1 API? A Comprehensive Benchmark

This article aggregates multiple independent evaluations of DeepSeek‑R1 across major cloud providers, comparing accuracy on AIME math problems, token‑per‑second throughput, first‑token latency, stability under high concurrency, and overall service reliability, ultimately highlighting Volcano Engine as the top performer.

AI inferenceAPI performanceBenchmark

0 likes · 12 min read

Which Cloud Platform Delivers the Fastest DeepSeek‑R1 API? A Comprehensive Benchmark

DevOps

Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchLarge Language ModelLlama 4

0 likes · 8 min read

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

21CTO

Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI safetyLarge Language ModelLlama 4

0 likes · 14 min read

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

AI Algorithm Path

Apr 2, 2025 · Artificial Intelligence

Vision‑Reasoning Model: Enabling LLMs to See and Think

The article analyzes the limitations of current visual language models and large reasoning models, proposes a combined Vision‑Reasoning Model (VRM), details its architecture using LLaVA, describes end‑to‑end fine‑tuning and reinforcement‑learning reward design, and argues that such models will become the next breakthrough in AI.

DeepSeekLLaVALarge Language Model

0 likes · 9 min read

Vision‑Reasoning Model: Enabling LLMs to See and Think

Java Architect Essentials

Apr 2, 2025 · Backend Development

Integrating DeepSeek Large Language Model with Spring Boot to Build an AI Chat Application

This guide demonstrates how to create a Spring Boot backend that integrates DeepSeek's large language model via the Spring AI OpenAI starter, covering project setup, dependency configuration, API key management, and a sample controller that provides AI-powered chat responses such as weather forecasts.

AI integrationChatbotDeepSeek

0 likes · 8 min read

Integrating DeepSeek Large Language Model with Spring Boot to Build an AI Chat Application

Nightwalker Tech

Apr 1, 2025 · Artificial Intelligence

Evaluation of AutoGLM: Features, Architecture, and Practical Test Results

This article reviews AutoGLM, the first "think‑while‑doing" AI agent released by Zhipu AI, detailing its core capabilities, full‑stack architecture, user experience, identified limitations, and the outcomes of three hands‑on tests using both the client application and a Chrome extension.

AI AgentAutoGLMEvaluation

0 likes · 4 min read

Evaluation of AutoGLM: Features, Architecture, and Practical Test Results

DaTaobao Tech

Mar 31, 2025 · Artificial Intelligence

AI Audio Generation and Voice Synthesis Practices at Taobao

The article surveys Taobao’s AI‑generated audio pipeline, detailing eight technical papers on image‑to‑video, OpenAI o1, multimodal video, and large‑model voice synthesis, while highlighting advances like VALL‑E, CosyVoice, F5‑TTS, data‑cleaning methods, and e‑commerce applications such as voice‑cloned live streams, multilingual TTS, AI video‑audio integration, and audiobook production.

AI audioLarge Language ModelTTS

0 likes · 11 min read

AI Audio Generation and Voice Synthesis Practices at Taobao

AI Frontier Lectures

Mar 31, 2025 · Artificial Intelligence

How Anthropic’s Path Tracing Reveals the Inner Workings of Claude 3.5 Haiku

Anthropic’s recent paper introduces a path‑tracing technique that uses cross‑layer transcoders and attribution graphs to sparsely visualize and analyze the decision‑making process of the Claude 3.5 Haiku large language model, demonstrating Pareto‑optimal improvements and a four‑stage reverse‑engineering framework while acknowledging current limitations.

AnthropicAttribution GraphClaude 3.5

0 likes · 14 min read

How Anthropic’s Path Tracing Reveals the Inner Workings of Claude 3.5 Haiku

Alibaba Cloud Big Data AI Platform

Mar 31, 2025 · Artificial Intelligence

Unlock AI-Powered Data Processing with MaxFrame’s AI Function

This article introduces MaxFrame’s AI Function, a new feature built on MaxCompute that integrates large language models like Qwen 2.5 and DeepSeek‑R1‑Distill‑Qwen to simplify model deployment and enable scalable text classification, information extraction, summarization, translation, and other AI-driven data processing tasks on massive datasets.

AI FunctionLarge Language ModelMaxCompute

0 likes · 19 min read

Unlock AI-Powered Data Processing with MaxFrame’s AI Function

Architects' Tech Alliance

Mar 28, 2025 · Artificial Intelligence

How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency

The report analyzes DeepSeek's latest V3 and R1 models, highlights their scaling‑law‑driven cost reductions, explains how Huawei Ascend optimizes inference by cutting KV‑Cache storage and improving compute efficiency, and surveys the model’s deployments across finance, government, manufacturing, and healthcare sectors.

AI efficiencyAI inferenceDeepSeek

0 likes · 4 min read

How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency

21CTO

Mar 27, 2025 · Artificial Intelligence

Google Unveils Gemini 2.5: The Most Advanced Reasoning AI Yet

Google's Gemini 2.5, billed as its most intelligent AI model, introduces advanced reasoning capabilities that outperform rivals on benchmarks like LMArena and Humanity's Last Exam, excels at web and agent code generation, and is now available to premium users via AI Studio with a 1‑million token context window.

AI reasoningGoogle GeminiLarge Language Model

0 likes · 4 min read

Google Unveils Gemini 2.5: The Most Advanced Reasoning AI Yet

Sohu Tech Products

Mar 26, 2025 · Artificial Intelligence

How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding

SpatialLM is a large language model designed for 3D spatial understanding that converts point‑cloud data from videos, RGB‑D images or LiDAR into structured scene descriptions, and this guide explains its architecture, model versions, repository links, and step‑by‑step deployment on Ubuntu with PyTorch.

3D point cloudLarge Language ModelPyTorch

0 likes · 7 min read

How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding

MaGe Linux Operations

Mar 26, 2025 · Artificial Intelligence

Why Qwen2.5‑VL‑32B Is the New AI Breakthrough for Vision and Math

Alibaba's newly released Qwen2.5‑VL‑32B multimodal model delivers state‑of‑the‑art visual and textual performance, offering human‑aligned responses, superior mathematical reasoning, fine‑grained image understanding, and efficient deployment features that make it a compelling tool for developers and AI researchers alike.

AI researchLarge Language ModelQwen2.5-VL-32B

0 likes · 9 min read

Why Qwen2.5‑VL‑32B Is the New AI Breakthrough for Vision and Math

21CTO

Mar 25, 2025 · Artificial Intelligence

Which LLM Is Best for Coding? Speed, Hallucination, and Context Compared

This article breaks down major large language models, defining key comparison metrics such as speed, hallucination rate, and context window, then evaluates each model with benchmarks like HumanEval+, ChatBot Arena, and Aider to help you choose the most suitable LLM for your coding tasks.

AIBenchmarkCoding performance

0 likes · 10 min read

Which LLM Is Best for Coding? Speed, Hallucination, and Context Compared

Cognitive Technology Team

Mar 22, 2025 · Artificial Intelligence

Three Stages of Developing Large Language Models and Practical Guidance

The article outlines the three development phases of large language models—building, pre‑training, and fine‑tuning—describes usage options, highlights key factors such as data scale, architecture, training processes, and evaluation, and offers practical advice for cost‑effective development.

LLMLarge Language ModelModel Development

0 likes · 3 min read

Three Stages of Developing Large Language Models and Practical Guidance

Architect's Alchemy Furnace

Mar 19, 2025 · Artificial Intelligence

Choosing the Right Deployment Strategy for Large Language Models: QwQ‑32B vs DeepSeek‑R1

This article compares QwQ‑32B and DeepSeek‑R1 large language models across performance, technical breakthroughs, deployment costs, and open‑source ecosystems, then evaluates pure‑local, hybrid, and pure‑cloud deployment options, and finally provides practical guidelines for preparing knowledge‑base documents and indexing methods.

AIDeploymentHybrid Cloud

0 likes · 10 min read

Choosing the Right Deployment Strategy for Large Language Models: QwQ‑32B vs DeepSeek‑R1

JD Tech

Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail

0 likes · 20 min read

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

Baidu Geek Talk

Mar 19, 2025 · Artificial Intelligence

Inside Baidu’s New Wenxin 4.5 & X1: Multimodal Breakthroughs and Tool‑Enabled AI

Baidu officially launched the Wenxin 4.5 and X1 large language models, showcasing native multimodal foundations, advanced attention masks, heterogeneous expert extensions, and tool‑calling capabilities, while offering low‑cost API access on the Qianfan platform and outlining the underlying technical innovations that drive their performance gains.

AI platformBaiduLarge Language Model

0 likes · 8 min read

Inside Baidu’s New Wenxin 4.5 & X1: Multimodal Breakthroughs and Tool‑Enabled AI

Java Architecture Diary

Mar 19, 2025 · Artificial Intelligence

Unlocking Google’s Gemma 3: Multimodal Power, 128k Context & Local Deployment Guide

This article introduces Google’s open‑source Gemma 3 model, highlighting its multimodal capabilities, massive 128k token context window, multilingual support, and provides step‑by‑step instructions for installing Ollama, pulling the model, and running local tests with code examples.

AI ModelGemma 3Large Language Model

0 likes · 7 min read

Unlocking Google’s Gemma 3: Multimodal Power, 128k Context & Local Deployment Guide

Code Mala Tang

Mar 15, 2025 · Artificial Intelligence

What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?

Google’s Gemma 3, a lightweight open‑source model with up to 27 billion parameters, offers multimodal input, 128K token context, and broad language support, outperforming leading rivals on single‑GPU benchmarks and providing flexible deployment options for developers and researchers alike.

AI ModelGemma 3Google AI

0 likes · 9 min read

What Makes Google’s New Gemma 3 Model a Game‑Changer for AI Developers?

Alibaba Cloud Big Data AI Platform

Mar 12, 2025 · Artificial Intelligence

Deploy, Fine‑Tune, and Compress DistilQwen2.5 on Alibaba Cloud PAI – A Complete Guide

This article walks through the full workflow for using Alibaba Cloud's open‑source DistilQwen2.5 models on the PAI platform, covering environment setup, model deployment, fine‑tuning with SFT and DPO, evaluation, and model compression for resource‑constrained scenarios.

DistilQwen2.5Large Language ModelPAI

0 likes · 13 min read

Deploy, Fine‑Tune, and Compress DistilQwen2.5 on Alibaba Cloud PAI – A Complete Guide

Architects' Tech Alliance

Mar 10, 2025 · Industry Insights

How AI Agents Are Redefining the Future of Intelligent Computing

This article provides a comprehensive analysis of AI agents, covering their historical origins, three‑layer technology stack, market size forecasts, evolution from training to inference, interaction modes, core modules, and the full industry chain from infrastructure providers to downstream applications.

AI AgentAI marketIndustry Analysis

0 likes · 13 min read

How AI Agents Are Redefining the Future of Intelligent Computing

CSS Magic

Mar 10, 2025 · Artificial Intelligence

Three Advanced Ways to Harness DeepSeek for Everyone

The article outlines three practical approaches to get the most out of DeepSeek—using it as a conversational assistant, integrating its API to power AI tools such as the Chrome immersive‑translation plugin, and leveraging it for AI‑assisted programming—while comparing the V3 and R1 models and offering concrete configuration steps.

AI programmingAI translationAPI integration

0 likes · 8 min read

Three Advanced Ways to Harness DeepSeek for Everyone

Top Architect

Mar 9, 2025 · Artificial Intelligence

Alibaba Unveils Qwen QwQ-32B: A Compact Open‑Source LLM Rivaling DeepSeek

Alibaba has released the open‑source Qwen QwQ‑32B model, a 32‑billion‑parameter LLM that matches DeepSeek‑R1's performance while being deployable on consumer‑grade GPUs, and the announcement is accompanied by extensive promotional offers for AI‑related products and services.

AI benchmarkAlibabaLarge Language Model

0 likes · 7 min read

Alibaba Unveils Qwen QwQ-32B: A Compact Open‑Source LLM Rivaling DeepSeek

ZhongAn Tech Team

Mar 8, 2025 · Artificial Intelligence

Weekly AI Rumors Issue 15: Manus AI Agent Launch, GPT‑4.5 Evaluation, and LightThinker Technique

This issue reviews the hype around China’s Manus AI Agent and its invitation‑code controversy, critiques OpenAI’s GPT‑4.5 performance versus DeepSeek, showcases industry solutions using AI agents, and introduces the LightThinker method for dynamically compressing LLM inference chains to boost efficiency.

AI AgentAI marketGPT-4.5

0 likes · 15 min read

Weekly AI Rumors Issue 15: Manus AI Agent Launch, GPT‑4.5 Evaluation, and LightThinker Technique

Java Tech Enthusiast

Mar 8, 2025 · Artificial Intelligence

QwQ-32B Large Language Model Overview and Performance

Alibaba’s new QwQ‑32B large‑language model, with 32 billion parameters, delivers performance comparable to or surpassing the 671‑billion‑parameter DeepSeek‑R1 across math, coding, and general benchmarks, and is available via HuggingFace, ModelScope, and a DashScope API demo with example Python code.

AI benchmarkLarge Language ModelPython API

0 likes · 5 min read

QwQ-32B Large Language Model Overview and Performance

AI Product Manager Community

Mar 7, 2025 · Artificial Intelligence

Function Calls vs ReAct: Core Concepts, Implementation, and Real‑World Use Cases

This article explains the technical principles behind Function Call and ReAct in large language models, provides code samples, compares their strengths and limitations, and illustrates each approach with practical scenarios such as smart customer service and financial analysis assistants.

AI Tool UseCase StudyLarge Language Model

0 likes · 9 min read

Function Calls vs ReAct: Core Concepts, Implementation, and Real‑World Use Cases

ByteDance Cloud Native

Mar 7, 2025 · Artificial Intelligence

How to Deploy the QwQ-32B Large Language Model on Volcengine Cloud in Minutes

This guide walks you through the end‑to‑end process of deploying the open‑source QwQ‑32B inference model on Volcengine's cloud platform, covering GPU ECS selection, VKE cluster creation, continuous delivery CP setup, vLLM service launch, and API gateway exposure.

GPU ECSLarge Language ModelQwQ-32B

0 likes · 8 min read

How to Deploy the QwQ-32B Large Language Model on Volcengine Cloud in Minutes

Java Architecture Diary

Mar 7, 2025 · Artificial Intelligence

Boost Inference Efficiency with QwQ-32B: Benchmarks, Resource Savings, and Java Integration

QwQ-32B, Alibaba’s new inference‑optimized large language model built on the Qwen2.5 architecture, outperforms DeepSeek‑R1 across math reasoning, code generation, and safety benchmarks while requiring only 24 GB vRAM, and the article provides detailed performance data, resource‑efficiency analysis, and step‑by‑step Java and Ollama integration instructions.

Function CallingInference OptimizationJava integration

0 likes · 7 min read

Boost Inference Efficiency with QwQ-32B: Benchmarks, Resource Savings, and Java Integration

AI Product Manager Community

Mar 6, 2025 · Artificial Intelligence

Why Alibaba’s QwQ‑32B Rivals 670B Models with Just 32B Parameters

Alibaba’s newly released 32‑billion‑parameter QwQ‑32B model matches the performance of 670‑billion‑parameter rivals like DeepSeek‑R1, integrates agent‑based reasoning, runs on consumer hardware, and has sparked strong open‑source community adoption, as shown by benchmark results and download statistics.

AgentAlibabaLarge Language Model

0 likes · 6 min read

Why Alibaba’s QwQ‑32B Rivals 670B Models with Just 32B Parameters

Programmer DD

Mar 6, 2025 · Artificial Intelligence

Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance

The QwQ-32B model, released by Alibaba Cloud, delivers DeepSeek‑R1‑level results with only 32 billion parameters, offers integrated agent capabilities, is open‑source under Apache 2.0, and can be quickly deployed locally via Ollama or integrated into Java applications using Spring AI.

AI inferenceLarge Language ModelModel Deployment

0 likes · 4 min read

Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance

Baobao Algorithm Notes

Mar 6, 2025 · Artificial Intelligence

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

AlibabaLarge Language ModelTransformer

0 likes · 4 min read

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Baobao Algorithm Notes

Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceLarge Language ModelLogic Reasoning

0 likes · 11 min read

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

Open Source Linux

Mar 5, 2025 · Artificial Intelligence

How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment

The article analyzes DeepSeek‑R1’s low‑cost inference architecture, Chinese language optimizations, novel prompt‑engineering techniques, and the practical challenges of deploying large domestic models, offering insights into vertical AI applications and the evolving open‑source ecosystem in China.

AI DeploymentDeepSeekLarge Language Model

0 likes · 8 min read

How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment

Smart Era Software Development

Mar 4, 2025 · Artificial Intelligence

How DeepSeek‑R1 Is Redefining AI Applications and the AIGC Landscape

The article analyses DeepSeek‑R1’s low‑cost open‑source strategy, superior inference performance (including GPQA benchmark gains over GPT‑4o), its focus on complex reasoning, math and programming, and how these traits reshape AIGC across industries while highlighting remaining privacy and ethical challenges.

AI ApplicationsAIGCBenchmark

0 likes · 6 min read

How DeepSeek‑R1 Is Redefining AI Applications and the AIGC Landscape

Architects' Tech Alliance

Feb 28, 2025 · Artificial Intelligence

DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1

The article analyzes DeepSeek’s latest V3 conversational model and R1 inference model, detailing their MoE architecture, training on H800 GPUs costing about $558 k, comparing compute expenses to Meta’s Llama 3.1, and showing that their API pricing is roughly one‑tenth of GPT‑4o for dialogue and one‑twentieth of OpenAI o1 for inference.

AI model analysisDeepSeekLarge Language Model

0 likes · 4 min read

DeepSeek V3 & R1: How Their Training Costs Compare to Llama 3.1

IT Architects Alliance

Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8Knowledge Distillation

0 likes · 18 min read

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

Tencent Technical Engineering

Feb 26, 2025 · Artificial Intelligence

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Thirteen engineers praise DeepSeek’s open‑source, reinforcement‑learning‑driven architecture—using FP8 storage and SFT‑free training—to deliver GPT‑4‑level reasoning at one‑twentieth the cost, enabling single‑GPU deployment, lowering barriers for academia and startups, and prompting notable market reactions that could democratize advanced AI.

AI cost reductionDeepSeekFP8

0 likes · 9 min read

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Architecture & Thinking

Feb 26, 2025 · Artificial Intelligence

Unlocking DeepSeek: A Comprehensive Guide to China’s Cutting-Edge AI Chat Model

This article provides an in‑depth overview of DeepSeek, covering its core multimodal and multilingual features, long‑context capabilities, domain optimizations, security, main functions, diverse application scenarios, and practical usage via web interface or API integration.

AI ChatbotArtificial IntelligenceDeepSeek

0 likes · 6 min read

Unlocking DeepSeek: A Comprehensive Guide to China’s Cutting-Edge AI Chat Model

Architects' Tech Alliance

Feb 25, 2025 · Artificial Intelligence

What Makes DeepSeek‑R1 a Game‑Changer in AIGC? Insights from Peking University

This article summarizes a Peking University lecture on DeepSeek‑R1, detailing its core concepts, advantages, and historical significance, then explains the underlying mechanisms of large‑model AI and AIGC tools, and finally offers practical guidance for selecting and efficiently applying AI solutions.

AI model analysisAIGCDeepSeek

0 likes · 5 min read

What Makes DeepSeek‑R1 a Game‑Changer in AIGC? Insights from Peking University

Ma Wei Says

Feb 25, 2025 · Artificial Intelligence

What Is GraphRAG? A Deep Dive into Next‑Gen Retrieval‑Augmented Generation and Open‑Source Implementations

GraphRAG, the next generation of Retrieval‑Augmented Generation, combines large language models, knowledge graphs, and graph databases to overcome traditional RAG’s knowledge gaps, hallucinations, and context limitations, and the article reviews its architecture, core modules, a recent 2025 paper, and six notable open‑source implementations.

Artificial IntelligenceGraphRAGLarge Language Model

0 likes · 9 min read

What Is GraphRAG? A Deep Dive into Next‑Gen Retrieval‑Augmented Generation and Open‑Source Implementations

AI Algorithm Path

Feb 22, 2025 · Artificial Intelligence

Elon Musk Unveils Grok 3, Claiming the World’s Most Powerful AI Model

The article details the launch of Grok 3 by Elon Musk’s xAI, highlighting its massive GPU infrastructure, benchmark dominance over GPT‑4o, multiple model variants, pricing for Premium+ users, upcoming API and voice features, and the team’s plan to open‑source Grok 2 once the new model stabilises.

AI benchmarkAI pricingElon Musk

0 likes · 6 min read

Elon Musk Unveils Grok 3, Claiming the World’s Most Powerful AI Model

Selected Java Interview Questions

Feb 21, 2025 · Artificial Intelligence

Integrating DeepSeek Large Model with Spring AI: A Step-by-Step Guide

This article explains how to integrate DeepSeek's large language models into a Spring AI application, covering model selection, API key configuration, URL setup, dependency inclusion, and providing complete Java code examples for both synchronous and streaming chat interactions.

Backend IntegrationDeepSeekJava

0 likes · 5 min read

Integrating DeepSeek Large Model with Spring AI: A Step-by-Step Guide

Top Architect

Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to download, quantize, and run the full‑size 671‑billion‑parameter DeepSeek R1 model on local hardware using Ollama, covering model selection, hardware requirements, step‑by‑step deployment commands, optional web UI setup, performance observations, and practical recommendations.

AIDeepSeekDynamic Quantization

0 likes · 16 min read

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

Practical DevOps Architecture

Feb 20, 2025 · Artificial Intelligence

Training MiniDeepSeek V3+R1 from Scratch: Full-Scale Large Model Technical Practice for 2025

This tutorial series provides a step‑by‑step technical guide to training, deploying, and fine‑tuning the MiniDeepSeek V3+R1 large language model, covering model performance, open‑source details, API usage, parameter explanation, multi‑turn chatbot construction, function calling, integration with Open WebUI, GraphRAG, Swarm, and various deployment and optimization techniques.

AILarge Language ModelMiniDeepSeek

0 likes · 4 min read

Training MiniDeepSeek V3+R1 from Scratch: Full-Scale Large Model Technical Practice for 2025

Tencent Technical Engineering

Feb 19, 2025 · Artificial Intelligence

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.

DeepSeek-R1Large Language Modellong chain of thought

0 likes · 39 min read

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

Java Tech Enthusiast

Feb 19, 2025 · Artificial Intelligence

xAI's Grok 3 Model: Benchmarks, Reasoning, and Industry Reactions

Elon Musk’s xAI introduced the Grok 3 family—trained on roughly 200,000 GPUs and offered in standard, mini and Reasoning versions—that claims top‑slot performance on math, science and coding benchmarks, outpacing Google Gemini, DeepSeek V3, Claude and OpenAI GPT‑4o, while pricing starts at $30 per month and drawing both praise for its speed and criticism for lingering hallucinations and ethical sensitivities.

AIBenchmarkDeepSearch

0 likes · 16 min read

xAI's Grok 3 Model: Benchmarks, Reasoning, and Industry Reactions

Alibaba Cloud Big Data AI Platform

Feb 19, 2025 · Artificial Intelligence

Build a DeepSeek AI Assistant with PAI‑RAG: Internet Search & Enterprise Knowledge Base

This guide walks you through using Alibaba Cloud's PAI‑RAG platform to deploy a DeepSeek large‑language‑model assistant that combines real‑time web search with an enterprise knowledge‑base, covering deployment, network‑search configuration, testing, and advanced enterprise features.

AI assistantDeepSeekEnterprise Knowledge Base

0 likes · 10 min read

Build a DeepSeek AI Assistant with PAI‑RAG: Internet Search & Enterprise Knowledge Base

Architecture Digest

Feb 18, 2025 · Artificial Intelligence

Integrating DeepSeek Large Model with Spring AI: A Step‑by‑Step Guide

This article explains how to obtain a DeepSeek API key, configure Spring AI with the appropriate base URL and model, and provides Java code examples for both synchronous and streaming chat interactions using the DeepSeek large‑language model.

API integrationChatbotDeepSeek

0 likes · 5 min read

Full-Stack DevOps & Kubernetes

Feb 18, 2025 · Cloud Native

Deploy Massive LLMs on Kubernetes: Step‑by‑Step Guide for Ollama and DeepSeek‑R1

This guide explains how to deploy large‑scale AI models such as Ollama and DeepSeek‑R1 on a Kubernetes 1.30 cluster, covering hardware requirements, PVC and deployment manifests, service exposure, image pulling, verification steps, API access, and monitoring with Prometheus and Grafana.

AIDeepSeekKubernetes

0 likes · 12 min read

Deploy Massive LLMs on Kubernetes: Step‑by‑Step Guide for Ollama and DeepSeek‑R1

JD Retail Technology

Feb 18, 2025 · Artificial Intelligence

Engineering Practices of JD Advertising Agent: JDZunTong Intelligent Assistant

JD’s advertising R&D team created the JDZunTong Intelligent Assistant by engineering a modular Agent platform that combines advanced Retrieval‑Augmented Generation (RAG 1.0 → 2.0) and Function‑Call capabilities, a visual designer, custom tool registration, and a native Python workflow engine to deliver intelligent customer service, data queries, and ad creation for merchants.

AIAgentJD Advertising

0 likes · 18 min read

Engineering Practices of JD Advertising Agent: JDZunTong Intelligent Assistant

Goodme Frontend Team

Feb 17, 2025 · Backend Development

How Plug Revolutionizes API Capture and Mocking with AI‑Powered Automation

This article introduces Plug, a unified front‑end tool that combines non‑intrusive interface capture, flexible mocking, and large‑model assistance to streamline API development for both mini‑programs and PC, while addressing HTTPS proxy challenges and performance considerations.

API mockingBackend DevelopmentInterface Capture

0 likes · 15 min read

How Plug Revolutionizes API Capture and Mocking with AI‑Powered Automation

AIWalker

Feb 16, 2025 · Artificial Intelligence

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

VARGPT is a novel multimodal large language model that unifies visual understanding and autoregressive image generation within a single architecture, extending LLaVA with next‑token and next‑scale prediction, trained through three staged data‑curated phases and achieving superior performance on numerous vision‑language benchmarks.

AI researchLarge Language ModelVARGPT

0 likes · 20 min read

VARGPT: A Unified Autoregressive Architecture for Multimodal Understanding and Generation

Ops Development & AI Practice

Feb 16, 2025 · Artificial Intelligence

Why FlashAttention Supercharges Qwen Models: A Technical Deep Dive

This article explains the FlashAttention algorithm, its memory‑efficient tiling and recomputation techniques, and how enabling the flash_attn flag dramatically speeds up Qwen‑series large models while outlining hardware, software requirements and potential trade‑offs.

FlashAttentionGPU OptimizationLarge Language Model

0 likes · 8 min read

Why FlashAttention Supercharges Qwen Models: A Technical Deep Dive

Code Ape Tech Column

Feb 14, 2025 · Artificial Intelligence

Integrating DeepSeek Large Model with Spring AI: A Step‑by‑Step Guide

This article explains how to integrate DeepSeek's large language models—both the chat‑oriented deepseek‑chat and the reasoning‑focused deepseek‑reasoner—into a Spring AI application, covering API key setup, base‑URL configuration, model selection, and providing full code examples for dependency, configuration, and a simple chat controller.

AIChatbotDeepSeek

0 likes · 6 min read

JD Cloud Developers

Feb 13, 2025 · Artificial Intelligence

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

This article demystifies DeepSeek R1 by explaining key concepts such as online search integration and the R1 model, detailing its two‑phase training pipeline, core techniques like iterative data enhancement, and showcases practical reproductions, benchmark tests, and deployment examples for AI developers.

DeepSeekKnowledge DistillationLarge Language Model

0 likes · 12 min read

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

Tencent Cloud Developer

Feb 13, 2025 · Artificial Intelligence

Build an AI Super App with DeepSeek and Tencent Cloud Code Assistant in Minutes

This guide walks you through configuring Tencent Cloud AI Code Assistant to use DeepSeek models—either via the DeepSeek public API or a locally‑deployed Ollama instance—covering prerequisites, step‑by‑step setup, required hardware, and command‑line examples.

AI code assistantDeepSeekLarge Language Model

0 likes · 7 min read

Build an AI Super App with DeepSeek and Tencent Cloud Code Assistant in Minutes

AI Algorithm Path

Feb 12, 2025 · Artificial Intelligence

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

This article compiles a curated reading list of foundational and recent research papers—from the original Transformer to chain‑of‑thought, mixture‑of‑experts, and reinforcement‑learning studies—that together explain the breakthroughs behind DeepSeek‑R1 and guide readers through the technical evolution of modern large language models.

DeepSeekLarge Language ModelMixture of Experts

0 likes · 15 min read

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

Architects' Tech Alliance

Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3

0 likes · 7 min read

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

Bilibili Tech

Feb 11, 2025 · Artificial Intelligence

Building a Scalable AI Agent for Code Review: Practices, Architecture, and Challenges

The article outlines how to build a scalable, modular AI code‑review agent using LangChain, detailing stages from naive prompting to advanced prompt engineering, architecture with six core modules, strategies to curb hallucinations, improve reliability, performance, and human‑AI collaboration, and future RAG integration.

AI AgentLangChainLarge Language Model

0 likes · 22 min read

Building a Scalable AI Agent for Code Review: Practices, Architecture, and Challenges

AI2ML AI to Machine Learning

Feb 10, 2025 · Artificial Intelligence

Eight Ways Enterprises Can Leverage DeepSeek

The article outlines eight distinct enterprise strategies for adopting DeepSeek, categorizing them by model maturity, available data types, and specific business challenges, and maps these approaches onto four capability tiers—from basic compliance requirements to advanced multimodal, low‑cost solutions.

AI AgentsDeepSeekEnterprise AI

0 likes · 3 min read

Eight Ways Enterprises Can Leverage DeepSeek

DataFunSummit

Feb 10, 2025 · Artificial Intelligence

Intelligent Decision-Making Large Model ORLM: Research, Training Challenges, Commercialization, and Future Directions

This article presents the ORLM intelligent decision‑making large model, detailing how real‑world decision problems are formalized and solved, the training difficulties and data synthesis methods, the transition from academic research to commercial platforms, and future technical improvement plans.

AIData SynthesisDecision Modeling

0 likes · 10 min read

Intelligent Decision-Making Large Model ORLM: Research, Training Challenges, Commercialization, and Future Directions