Tagged articles

Large Language Models

1206 articles · Page 8 of 13

May 2, 2025 · Artificial Intelligence

Debunking Common Misconceptions About the Model Context Protocol (MCP)

This article clarifies three major misunderstandings about the Model Context Protocol (MCP), explaining that it does not require large‑model support, works even without function‑calling capabilities, and is not natively built into models, while outlining how MCP standardizes context augmentation through a black‑box server architecture.

AIFunction CallingLarge Language Models

0 likes · 5 min read

Debunking Common Misconceptions About the Model Context Protocol (MCP)

JD Tech

Apr 30, 2025 · Artificial Intelligence

TimeHF: A Billion‑Scale Time Series Forecasting Model Guided by Human Feedback

The JD Supply Chain algorithm team introduces TimeHF, a billion‑parameter time‑series large model that leverages RLHF to boost demand‑forecast accuracy by over 10%, detailing dataset construction, the PCTLM architecture, a custom RLHF framework (TPO), and extensive SOTA experimental results.

Big DataLarge Language ModelsRLHF

0 likes · 10 min read

TimeHF: A Billion‑Scale Time Series Forecasting Model Guided by Human Feedback

Cognitive Technology Team

Apr 30, 2025 · Artificial Intelligence

AI Claims of Human-Level Intelligence Unveiled: Reliance on Massive Rules Over True Reasoning

The article critiques AI giants’ claims of nearing human-level intelligence, highlighting research that shows current models rely on massive rule memorization rather than genuine reasoning, leading to brittleness in navigation, mathematics, and adaptability, and emphasizing the need to understand these limitations for future progress.

AI limitationsArtificial IntelligenceLarge Language Models

0 likes · 8 min read

AI Claims of Human-Level Intelligence Unveiled: Reliance on Massive Rules Over True Reasoning

Data Thinking Notes

Apr 29, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

This article chronicles the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, multimodal models, and recent cost‑efficient innovations like DeepSeek‑R1, highlighting key architectures, training methods, alignment techniques, and their transformative impact on AI applications.

AI alignmentLarge Language ModelsTransformer

0 likes · 29 min read

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

DevOps

Apr 28, 2025 · Artificial Intelligence

Vibe Coding: An Introduction to AI‑Driven Natural‑Language Programming

This article introduces Vibe Coding, an AI‑driven programming approach proposed by Andrej Karpathy, explains its core concepts, workflow, advantages, tools, use cases, best practices, and future outlook, and provides a complete example of generating a simple weather app using natural‑language prompts.

AI programmingBest PracticesLarge Language Models

0 likes · 16 min read

Vibe Coding: An Introduction to AI‑Driven Natural‑Language Programming

ITPUB

Apr 28, 2025 · Artificial Intelligence

How Large Language Models are Transforming Automotive Operations and Optimization

In this interview, an automotive industry expert explains how large language models and advanced operations‑optimization techniques are reshaping vehicle design, production planning, logistics, and customer services, while also discussing implementation challenges, team requirements, and future AI‑driven opportunities.

AI adoptionAutomotive AILarge Language Models

0 likes · 15 min read

How Large Language Models are Transforming Automotive Operations and Optimization

ZhongAn Tech Team

Apr 28, 2025 · Artificial Intelligence

Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness

This weekly technology digest highlights significant advancements in artificial intelligence, including OpenAI's GPT-4o upgrades, Tencent's Hunyuan 3D v2.5 release, and major funding rounds for xAI and Manus, alongside expert discussions on the future evolution of AI agent networks and the theoretical possibility of machine consciousness.

AI agentsAI fundingArtificial Intelligence

0 likes · 7 min read

Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness

Volcano Engine Developer Services

Apr 28, 2025 · Artificial Intelligence

How ByteBrain’s AI‑Powered Infra is Redefining Cloud and Database Performance

ByteDance’s ByteBrain team showcases how large‑model AI, operations research, and system‑level innovations have produced award‑winning papers and billions of yuan in cost savings while improving on‑call efficiency, database estimation, and cloud infrastructure reliability.

AICloud ComputingDatabases

0 likes · 9 min read

How ByteBrain’s AI‑Powered Infra is Redefining Cloud and Database Performance

AsiaInfo Technology: New Tech Exploration

Apr 25, 2025 · Artificial Intelligence

How Evidence Generation Boosts Document-Grounded Dialogue with LLMs

This study introduces DGDE, a document‑grounded dialogue framework that leverages large language model‑generated evidence, combining retrieval, reranking, fine‑tuning, and iterative question correction to markedly improve accuracy, comprehensiveness, coherence, and completeness on the Doc2dial benchmark.

Large Language Modelsdocument-grounded dialogueevidence generation

0 likes · 21 min read

How Evidence Generation Boosts Document-Grounded Dialogue with LLMs

DataFunTalk

Apr 25, 2025 · Artificial Intelligence

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Recent empirical research by Tsinghua’s LeapLab and Shanghai Jiao Tong University reveals that reinforcement‑learning‑based fine‑tuning (RLVR) improves sampling efficiency but does not extend the fundamental reasoning abilities of large language models beyond their base capabilities, as demonstrated across mathematics, code, and visual reasoning benchmarks.

AI researchLarge Language ModelsRLVR

0 likes · 12 min read

Does Reinforcement Learning Really Expand Reasoning Capacity in Large Language Models? Insights from Recent Empirical Study

Didi Tech

Apr 24, 2025 · Artificial Intelligence

Algorithmic Foundations and Evolution of Natural Language Processing

The article surveys the Algorithmic Foundations of Engineering R&D series, tracing NLP’s evolution from rule‑based systems to today’s multimodal large‑model era, reviewing core machine‑learning and deep‑learning techniques, transformer breakthroughs, representation learning, optimization methods, and emerging research such as retrieval‑augmented generation and AI agents.

AILarge Language ModelsNLP

0 likes · 43 min read

Algorithmic Foundations and Evolution of Natural Language Processing

Alimama Tech

Apr 23, 2025 · Artificial Intelligence

How AI Agents Outsmart Humans in the “Who Is Spy” Campus Challenge

The campus AI Agent competition showcased how large‑language‑model‑powered agents can reason, deceive, and collaborate in a social deduction game, revealing model performance trends, participant insights, and future directions for multi‑agent AI research.

AIAgent CompetitionLarge Language Models

0 likes · 6 min read

How AI Agents Outsmart Humans in the “Who Is Spy” Campus Challenge

DevOps

Apr 22, 2025 · Artificial Intelligence

How to Think About Agent Frameworks: A Critical Review of Design Patterns, Challenges, and LangGraph

This article critically examines popular agent frameworks, compares OpenAI and Anthropic definitions, highlights the core difficulty of maintaining proper context for reliable agents, and presents LangGraph’s declarative and imperative features along with practical guidance for building production‑grade agent systems.

AI engineeringAgent systemsLangGraph

0 likes · 24 min read

How to Think About Agent Frameworks: A Critical Review of Design Patterns, Challenges, and LangGraph

Architects' Tech Alliance

Apr 22, 2025 · Artificial Intelligence

What Are AI Agents? Definitions, Types, and Cutting‑Edge Technologies Explained

This article provides a comprehensive overview of AI agents, covering their definition, classification into language‑based, vision‑based, and multimodal types, core capabilities such as understanding, perception, planning, and action, and recent breakthroughs like OpenAI ComputerUse, SpiritSight, and MobileFlow.

AI agentsComputerUseLarge Language Models

0 likes · 9 min read

What Are AI Agents? Definitions, Types, and Cutting‑Edge Technologies Explained

Alibaba Cloud Big Data AI Platform

Apr 22, 2025 · Artificial Intelligence

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

This article introduces DistilQwen2.5-DS3-0324, a distilled language model series that balances rapid inference with strong reasoning by applying a fast‑thinking chain‑of‑thought strategy, details its two‑stage distillation framework, evaluation on diverse benchmarks, and provides code for downloading and using the models.

Chain-of-ThoughtLarge Language Modelsdeep learning

0 likes · 17 min read

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

Architect

Apr 21, 2025 · Artificial Intelligence

Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption

Microsoft Research introduced BitNet b1.58 2B4T, a native 1‑bit large language model with 2 billion parameters trained on 4 trillion tokens, achieving only 0.4 GB non‑embedding memory, 0.028 J decoding energy, and 29 ms CPU latency while matching full‑precision performance.

1-bit LLMAI researchBitNet

0 likes · 7 min read

Microsoft Research Releases BitNet b1.58 2B4T: A 1‑Bit Native Large Language Model with Ultra‑Low Memory and Energy Consumption

AI Frontier Lectures

Apr 19, 2025 · Artificial Intelligence

Why Recent AI Model Gains May Be Illusory: Benchmark Gaps and Real‑World Limits

The author argues that since August 2023 AI large‑model improvements have stalled in practical applications, with benchmark scores diverging from user experience, citing security‑scanning experiments, possible benchmark gaming, and alignment bottlenecks that undermine confidence in claimed progress.

AIBenchmarkingIndustry insight

0 likes · 13 min read

Why Recent AI Model Gains May Be Illusory: Benchmark Gaps and Real‑World Limits

Baidu Tech Salon

Apr 16, 2025 · Artificial Intelligence

Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models

The China AI Industry Alliance unveiled its Q1 2025 Fangsheng benchmark, showing Baidu’s new multimodal models—Wenxin 4.5 leading basic abilities and Wenxin X1 excelling in reasoning—available for free on the Wenxin Yiyan platform, while Baidu pledges major 2025 investments in AI, data‑center and cloud infrastructure.

AIFactTestingLarge Language Models

0 likes · 4 min read

Release of the 'Fangsheng' Large Model Benchmark Results (Q1 2025) and Overview of Baidu's Wenxin 4.5 and X1 Models

Baidu Geek Talk

Apr 16, 2025 · Industry Insights

What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

At the AIIA’s 14th plenary meeting in Nanjing, the FactTesting benchmark released its Q1 2025 results, evaluating over 200 large models and highlighting Baidu’s Wenxin 4.5 and Wenxin X1 as leaders in basic and reasoning capabilities, while outlining the expanded multimodal and agent testing roadmap for the year.

AI benchmarkChina AIFactTesting

0 likes · 5 min read

What Do the Latest AIIA FactTesting Benchmarks Reveal About China’s Large Language Models?

DaTaobao Tech

Apr 16, 2025 · Artificial Intelligence

Comparative Analysis of AI Development Tools (2024‑2025)

The 2024‑2025 comparative review evaluates cloud‑based AI development platforms, AI‑native code editors, IDE plugins, and their underlying large language models—detailing features, user experience, pricing, open‑source status, strengths and weaknesses, offering recommendations for UI prototyping, full‑stack projects, and forecasting future multimodal, collaborative AI‑assisted development trends.

AI Development ToolsLarge Language Modelscode generation

0 likes · 24 min read

Comparative Analysis of AI Development Tools (2024‑2025)

Data Thinking Notes

Apr 15, 2025 · Artificial Intelligence

Understanding AI Agents: From Reinforcement Learning to LLM-Powered Planning

Professor Li Hongyi’s lecture provides a comprehensive, step‑by‑step exploration of AI agents, covering their definitions, reinforcement‑learning roots, LLM integration, memory mechanisms, tool usage, planning strategies, benchmarks, and practical examples, offering a valuable resource for anyone studying modern artificial intelligence.

AI agentsLarge Language ModelsPlanning

0 likes · 67 min read

Understanding AI Agents: From Reinforcement Learning to LLM-Powered Planning

Baidu Geek Talk

Apr 14, 2025 · Artificial Intelligence

PaddlePaddle Framework 3.0: Five Core Breakthroughs Reshaping Large Model Development

PaddlePaddle Framework 3.0 delivers five breakthroughs—dynamic‑static unified automatic parallelism, integrated training‑inference pipelines, high‑order scientific differentiation, a neural‑network compiler with automatic operator fusion, and streamlined heterogeneous chip adaptation—drastically reducing development effort, boosting training speed, and expanding compatibility for large‑scale AI models.

AI InfrastructureLarge Language ModelsModel Inference Optimization

0 likes · 23 min read

PaddlePaddle Framework 3.0: Five Core Breakthroughs Reshaping Large Model Development

AI Algorithm Path

Apr 13, 2025 · Artificial Intelligence

Understanding GRPO: Group Relative Policy Optimization for LLM Training

The article explains GRPO, a reinforcement‑learning algorithm that extends PPO with group sampling, no value network, dual penalties and KL regularisation, showing how it improves efficiency and stability when fine‑tuning large language models such as DeepSeek‑Math and DeepSeek‑R1.

DeepSeekGRPOLarge Language Models

0 likes · 6 min read

Understanding GRPO: Group Relative Policy Optimization for LLM Training

AntTech

Apr 10, 2025 · Artificial Intelligence

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

At the ICLR 2025 live session in Singapore, Ant Group showcased four cutting‑edge papers—CodePlan, Animate‑X, Group Position Embedding, and OmniKV—demonstrating advances in large‑language‑model reasoning, universal character animation, layout‑aware document understanding, and efficient long‑context inference.

AI researchLarge Language ModelsLong Context

0 likes · 6 min read

Ant Group Presents Four AI Research Papers at ICLR 2025 Live Showcase

JD Retail Technology

Apr 10, 2025 · Artificial Intelligence

How JD’s TimeHF Billion‑Scale Time‑Series Model Boosts Forecast Accuracy with RLHF

JD’s supply‑chain algorithm team introduces TimeHF, a billion‑scale time‑series large model that leverages RLHF to improve demand forecasting accuracy by over 10%, detailing dataset creation, the PCTLM architecture, a custom RL framework (TPO), and superior benchmark results.

AILarge Language ModelsPCTLM

0 likes · 12 min read

How JD’s TimeHF Billion‑Scale Time‑Series Model Boosts Forecast Accuracy with RLHF

Alibaba Cloud Developer

Apr 9, 2025 · Artificial Intelligence

Unlocking LLM Reasoning: A Deep Dive into Prompt Engineering Techniques

This article surveys classic prompt‑engineering methods such as Chain‑of‑Thought, Self‑Consistency, Least‑to‑Most, Boosting of Thoughts, Tree of Thoughts, and AutoGPT, summarizing their core ideas, advantages, limitations, and experimental results to help readers understand how to enhance large language model reasoning without model fine‑tuning.

AI reasoningChain-of-ThoughtLarge Language Models

0 likes · 22 min read

Unlocking LLM Reasoning: A Deep Dive into Prompt Engineering Techniques

Big Data Technology & Architecture

Apr 9, 2025 · Artificial Intelligence

Overview of Data Agents: Definitions, Applications, and Recent Developments by Google, Alibaba Cloud, and ByteDance

This article introduces the concept of AI-powered Data Agents, outlines their key features and use cases across enterprise analytics, data governance, and intelligent customer service, and reviews recent implementations from Google, Alibaba Cloud, and ByteDance, highlighting their impact on modern data-driven workflows.

Artificial IntelligenceData AgentEnterprise AI

0 likes · 8 min read

Overview of Data Agents: Definitions, Applications, and Recent Developments by Google, Alibaba Cloud, and ByteDance

AIWalker

Apr 8, 2025 · Artificial Intelligence

AgenticIR: An Agentic System for Restoring Images with Complex Degradations

AgenticIR combines visual language models and large language models in a multi‑stage reasoning workflow—perception, planning, execution, reflection, and adjustment—to evaluate, plan, and iteratively apply specialized restoration tools, achieving superior results on complexly degraded images compared to baseline methods.

Agentic SystemsICLR 2025Large Language Models

0 likes · 15 min read

AgenticIR: An Agentic System for Restoring Images with Complex Degradations

Model Perspective

Apr 8, 2025 · Artificial Intelligence

Why Learning Machine Learning Still Matters in the Age of Giant AI Models

The article argues that despite the rapid rise of powerful large language models, mastering machine learning remains essential because it underpins these models, offers customized solutions for specialized tasks, and cultivates the mathematical, programming, and analytical skills needed to effectively use and extend AI technologies.

AILarge Language ModelsMachine Learning

0 likes · 10 min read

Why Learning Machine Learning Still Matters in the Age of Giant AI Models

macrozheng

Apr 8, 2025 · Artificial Intelligence

Boost AI Prompt Quality with Prompt Optimizer: Features, Docker Setup & Real‑World Demo

This guide introduces Prompt Optimizer, a client‑side AI prompt‑enhancement tool with over 2k GitHub stars, outlines its key features, provides step‑by‑step Docker installation commands, showcases a real‑world SpringBoot‑Vue e‑commerce project, and demonstrates how to generate and compare optimized prompts for better LLM responses.

AI Prompt OptimizationDockerLarge Language Models

0 likes · 6 min read

Boost AI Prompt Quality with Prompt Optimizer: Features, Docker Setup & Real‑World Demo

Alibaba Cloud Developer

Apr 8, 2025 · Artificial Intelligence

Unlocking LLM Secrets: From Prompt Basics to RAG and Tool Integration

This article introduces the fundamental paradigms of large language models, explaining how simple prompts, messages, and tools like RAG and ReAct enable powerful applications, while providing practical code examples, translation strategies, and insights on prompt engineering, tool usage, and model fine‑tuning.

AILLM applicationsLarge Language Models

0 likes · 23 min read

Unlocking LLM Secrets: From Prompt Basics to RAG and Tool Integration

DataFunSummit

Apr 7, 2025 · Artificial Intelligence

Bridging the Gap Between Large Models and Real‑World Applications with RAG and Agents

This article examines how Retrieval‑Augmented Generation (RAG) and multi‑agent technologies narrow the gap between large language models and practical deployment, highlighting their roles in operations automation, financial risk control, intelligent data governance, database localization, edge inference, and future AI‑driven solutions.

Data GovernanceLarge Language ModelsOperations Automation

0 likes · 8 min read

Bridging the Gap Between Large Models and Real‑World Applications with RAG and Agents

Architecture and Beyond

Apr 5, 2025 · Artificial Intelligence

Why Defining Problem Boundaries Is Crucial for Effective AI Agents

The article discusses how defining clear problem boundaries is essential for AI agents, explains the challenges of vague tasks for large language models, and proposes multi‑stage decomposition, self‑reflection, and human‑in‑the‑loop strategies to improve AI performance on complex, dynamic tasks.

AIHuman-AI CollaborationLarge Language Models

0 likes · 13 min read

Why Defining Problem Boundaries Is Crucial for Effective AI Agents

Ops Development & AI Practice

Apr 4, 2025 · Industry Insights

Are Open‑Source LLMs Closing the Gap with Closed‑Source Giants?

A recent leaderboard analysis of top LLMs reveals that while closed‑source models like Gemini‑2.5‑Pro and ChatGPT‑4o still lead overall, open‑source models such as DeepSeek‑V3 and Llama are rapidly narrowing the performance gap, especially in specialized tasks like coding, driven by faster tech diffusion, public datasets, community collaboration, and reduced compute costs.

AI competitionIndustry TrendsLarge Language Models

0 likes · 8 min read

Are Open‑Source LLMs Closing the Gap with Closed‑Source Giants?

Code Mala Tang

Apr 3, 2025 · Artificial Intelligence

Intel Core Ultra 5 vs Apple M1: Which Wins for Large Language Model Inference?

This article compares the inference performance of a high‑end Intel Core Ultra 5 AI workstation with an Apple M1 MacBook Air using the IPEX‑LLM library, detailing installation steps, minimal code changes, resource usage, and benchmark results for small and large language models.

AI inferenceApple M1Hardware Comparison

0 likes · 9 min read

Intel Core Ultra 5 vs Apple M1: Which Wins for Large Language Model Inference?

JD Retail Technology

Apr 2, 2025 · Artificial Intelligence

One4All: A Scalable Multi‑Task Generative Recommendation Framework for CPS Advertising

The paper introduces One4All, a scalable multi‑task generative recommendation framework for CPS advertising that combines few‑shot intent prompting, a Rewards‑in‑Context multi‑objective optimization, and an online model‑selection strategy, delivering 2‑3× offline HitRate/NDCG gains and notable online CTR, CVR, and commission improvements.

AdvertisingLLMLarge Language Models

0 likes · 14 min read

One4All: A Scalable Multi‑Task Generative Recommendation Framework for CPS Advertising

Architects' Tech Alliance

Apr 1, 2025 · Artificial Intelligence

What’s New in Large Language Models? DeepSeek V3, Qwen2.5‑Omni, Gemini 2.5 Pro, and GPT‑4o Unpacked

This article reviews the latest updates from major LLM providers—DeepSeek V3’s parameter boost and longer context, Qwen2.5‑Omni’s open‑source multimodal 7B model, Google Gemini 2.5 Pro’s 1 M‑token window and multimodal prowess, and OpenAI GPT‑4o’s image generation and reduced pricing—highlighting technical specs, capabilities, and availability.

DeepSeekGPT-4oGemini

0 likes · 9 min read

What’s New in Large Language Models? DeepSeek V3, Qwen2.5‑Omni, Gemini 2.5 Pro, and GPT‑4o Unpacked

Architect

Apr 1, 2025 · Artificial Intelligence

When to Fine‑Tune Large Language Models vs. Relying on Prompting and RAG

The article explains why most projects should start with prompt engineering or simple agent workflows, outlines the scenarios where model fine‑tuning adds real value, compares fine‑tuning with Retrieval‑Augmented Generation, and offers practical criteria for deciding which approach to adopt.

AI DeploymentLarge Language ModelsLoRA

0 likes · 9 min read

When to Fine‑Tune Large Language Models vs. Relying on Prompting and RAG

AI Frontier Lectures

Mar 31, 2025 · Industry Insights

Why GPT‑4o’s Image Generation Is Overwhelming Users—and What It Means for AI

OpenAI’s GPT‑4o image generation, launched only for paid users, quickly hit performance bottlenecks and sparked a flood of viral content, prompting technical analysis of its multimodal capabilities, speed issues, copyright concerns, and the broader impact on the AI industry.

AI industryAI multimodalGPT-4o

0 likes · 5 min read

Why GPT‑4o’s Image Generation Is Overwhelming Users—and What It Means for AI

AntTech

Mar 31, 2025 · Artificial Intelligence

Ant Group Papers Accepted at ICLR 2025: Summaries and Links

The article presents the abstracts, publication types, links, and research areas of seventeen Ant Group papers accepted at ICLR 2025, covering topics such as embodied robot co‑design, efficient distributed training for large language models, optimization via LLMs, character animation, interactive frame interpolation, KV‑cache management, and privacy‑preserving Transformers.

AI researchAnt GroupICLR2025

0 likes · 23 min read

Ant Group Papers Accepted at ICLR 2025: Summaries and Links

Architects' Tech Alliance

Mar 31, 2025 · Artificial Intelligence

A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)

This article reviews the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, alignment techniques, multimodal extensions, open‑weight releases, and the cost‑efficient DeepSeek‑R1 in 2025, highlighting key technical advances, scaling trends, and their societal impact.

AI alignmentLLM evolutionLarge Language Models

0 likes · 26 min read

A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)

Data Thinking Notes

Mar 30, 2025 · Artificial Intelligence

How DeepSeek‑R1 and Kimi‑K1.5 Push the Boundaries of Strong Reasoning Models

This comprehensive analysis by the Peking University AI Alignment team dissects the technical innovations behind DeepSeek‑R1, DeepSeek‑R1 Zero, and Kimi‑K1.5, covering reinforcement‑learning‑based post‑training, rule‑based rewards, GRPO optimization, scaling laws, multimodal extensions, safety challenges, and future research directions.

AI alignmentDeepSeekKimi

0 likes · 57 min read

How DeepSeek‑R1 and Kimi‑K1.5 Push the Boundaries of Strong Reasoning Models

Architect

Mar 30, 2025 · Artificial Intelligence

What Is Retrieval-Augmented Generation? A Deep Dive into RAG Techniques

This article provides a comprehensive survey of Retrieval‑Augmented Generation (RAG), covering its basic principles, key components, seven technical variants, challenges, evaluation methods, and future research directions across multimodal, graph‑based, and agentic extensions.

AI SurveyLarge Language ModelsMultimodal AI

0 likes · 9 min read

What Is Retrieval-Augmented Generation? A Deep Dive into RAG Techniques

AI Frontier Lectures

Mar 30, 2025 · Artificial Intelligence

Do Large Language Models Mirror Human Brain Language Processing? Google’s Groundbreaking Findings

Google researchers discovered a linear relationship between brain activity recorded during natural conversation and the internal embeddings of a speech‑to‑text large language model, revealing that acoustic and lexical representations from the model can accurately predict neural responses in both language comprehension and production.

AI researchGoogleLarge Language Models

0 likes · 8 min read

Do Large Language Models Mirror Human Brain Language Processing? Google’s Groundbreaking Findings

Cognitive Technology Team

Mar 30, 2025 · Artificial Intelligence

Why Prompt Engineering Is the “Mind‑Reading” Technique of AI: The Crucial Role of In‑Context Learning

Prompt engineering uses in‑context learning to turn large language models into precise, task‑aware assistants by providing well‑crafted prompts that guide the model’s probability distribution, reduce hallucinations, and unlock hidden knowledge without any parameter tuning.

Artificial IntelligenceIn-Context LearningLarge Language Models

0 likes · 6 min read

Why Prompt Engineering Is the “Mind‑Reading” Technique of AI: The Crucial Role of In‑Context Learning

Alibaba Cloud Big Data AI Platform

Mar 29, 2025 · Artificial Intelligence

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

The article introduces the DistilQwen2.5‑R1 series, which leverages a novel knowledge‑distillation pipeline—including CoT data evaluation, improvement, and validation—to transfer deep reasoning abilities from large models like DeepSeek‑R1 to compact models, achieving superior performance across math, code, and scientific benchmarks and providing open‑source checkpoints and deployment guides for practical use.

AI inferenceKnowledge DistillationLarge Language Models

0 likes · 17 min read

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

Alimama Tech

Mar 28, 2025 · Artificial Intelligence

How Alibaba’s Taobao AI Models Revolutionize E‑Commerce Recommendations and Bidding

Alibaba’s Taobao Group unveiled its AIGX technology suite, including the RecGPT recommendation model, the AIGB generative bidding system, and a new AI‑generated video engine, detailing open‑source benchmarks, NeurIPS workshop participation, and measurable ROI improvements for e‑commerce advertising.

AIE‑CommerceGenerative Bidding

0 likes · 5 min read

How Alibaba’s Taobao AI Models Revolutionize E‑Commerce Recommendations and Bidding

Qborfy AI

Mar 28, 2025 · Artificial Intelligence

Master Prompt Engineering: From Basics to Advanced SQL Generation

This article walks readers through the fundamentals of prompt engineering—covering role, context, instruction, examples, and output formatting—then demonstrates a step‑by‑step construction of a sophisticated SQL‑generation prompt, complete with concrete code snippets, best‑practice tips, and reference resources.

AI Prompt DesignInstruction TuningLarge Language Models

0 likes · 21 min read

Master Prompt Engineering: From Basics to Advanced SQL Generation

Alibaba Cloud Developer

Mar 26, 2025 · Artificial Intelligence

Why DeepSeek Is Shaking Up the LLM Landscape: Architecture, Performance, and Cost

DeepSeek, a Chinese AI startup, offers open‑source large language models—DeepSeek‑V3 for general tasks and DeepSeek‑R1 for intensive reasoning—featuring MoE, MLA, low‑cost training, and competitive performance against OpenAI’s GPT‑4o, while providing detailed usage guidance and cost analysis.

AI inferenceDeepSeekLarge Language Models

0 likes · 21 min read

Why DeepSeek Is Shaking Up the LLM Landscape: Architecture, Performance, and Cost

Architects' Tech Alliance

Mar 25, 2025 · Industry Insights

How Near‑Memory Computing Can Power Edge LLMs: A 2025 Storage Framework

The article analyzes the challenges of deploying large language models on cloud servers—such as latency, security, and constant connectivity—and explains how near‑memory computing architectures (PNM, PIM, CIM) can integrate storage and processing to enable efficient, high‑performance edge AI deployments, outlining the trade‑offs of each approach.

Artificial IntelligenceLarge Language ModelsNear-Memory Computing

0 likes · 5 min read

How Near‑Memory Computing Can Power Edge LLMs: A 2025 Storage Framework

Alibaba Cloud Developer

Mar 25, 2025 · Artificial Intelligence

Boost Your AI Search Skills: Advanced Prompt & Query Tricks

This guide explains how to leverage AI tools with deep web‑search capabilities, covering site‑specific queries, wildcard operators, date ranges, Boolean logic, and effective prompt engineering techniques—including Socratic questioning and CRISPE framework—to improve information retrieval accuracy and efficiency across various domains.

AIInformation RetrievalLarge Language Models

0 likes · 8 min read

Boost Your AI Search Skills: Advanced Prompt & Query Tricks

AI Frontier Lectures

Mar 24, 2025 · Artificial Intelligence

What Can AI Agents Learn from the Latest AIR 2025 Research?

The article compiles insights from the AIR 2025 conference and related talks, covering the evolution of agents from reinforcement‑learning to LLM‑driven systems, novel agent architectures like AIDE, GUI agents, natural‑language reinforcement learning, and scaling advances in large language models such as Qwen, while highlighting key algorithms, benchmarks, and open research questions.

AI agentsGUI agentsLarge Language Models

0 likes · 27 min read

What Can AI Agents Learn from the Latest AIR 2025 Research?

Architects' Tech Alliance

Mar 22, 2025 · Industry Insights

What Does DeepSeek’s 2025 AI Report Reveal About the Future of Large Models?

The 2025 DeepSeek Insight report analyzes DeepSeek’s new large‑model releases, compares US and Chinese AI ecosystems, outlines diverse application scenarios such as government, healthcare and aerospace, and provides practical guidance for safely leveraging these models despite their current limitations.

AI industryAI safetyDeepSeek

0 likes · 5 min read

What Does DeepSeek’s 2025 AI Report Reveal About the Future of Large Models?

Model Perspective

Mar 21, 2025 · Artificial Intelligence

How DeepSeek’s Tree‑Based Reasoning Transforms AI Interaction

DeepSeek’s R1 inference mode replaces linear chain‑of‑thought with a transparent, multi‑path tree reasoning system, offering layered analysis, intent understanding, memory management, emotion detection, and hallucination mitigation, illustrated through a practical example of buying authentic cigarettes and detailed technical breakdowns.

Artificial IntelligenceHallucinationLarge Language Models

0 likes · 16 min read

How DeepSeek’s Tree‑Based Reasoning Transforms AI Interaction

Continuous Delivery 2.0

Mar 21, 2025 · Artificial Intelligence

AI-Driven Automated Unit Test Generation Framework: Architecture, Workflow, and Evaluation

This article presents an AI‑powered framework that automatically scans codebases, generates comprehensive unit tests using large language models, and includes self‑repair agents, detailing its workflow, core components, strategies for accuracy, practical benefits, and current limitations.

AI testingLarge Language ModelsSoftware Automation

0 likes · 9 min read

AI-Driven Automated Unit Test Generation Framework: Architecture, Workflow, and Evaluation

AI Algorithm Path

Mar 20, 2025 · Artificial Intelligence

Understanding Multimodal Large Language Models: Recent Advances and Comparative Analysis

This article surveys the latest multimodal large language model research, dissecting the design, training strategies, and performance trade‑offs of models such as Llama 3.2, Molmo, NVLM, Qwen2‑VL, Pixtral, MM1.5, Emu3, and Janus, and highlights the challenges of fair cross‑model evaluation.

AI researchCross-AttentionLarge Language Models

0 likes · 16 min read

Understanding Multimodal Large Language Models: Recent Advances and Comparative Analysis

AI Frontier Lectures

Mar 20, 2025 · Artificial Intelligence

Why Multimodal LLMs Still Struggle with Multi-Image Math Reasoning: Insights from MV‑MATH

This article introduces the MV‑MATH dataset, a large‑scale multi‑image math benchmark, and evaluates 24 open‑source and closed‑source multimodal large language models, revealing significant performance gaps, especially on complex visual dependencies and higher difficulty levels.

Large Language ModelsMultimodal AIdataset

0 likes · 8 min read

Why Multimodal LLMs Still Struggle with Multi-Image Math Reasoning: Insights from MV‑MATH

JavaScript

Mar 20, 2025 · Artificial Intelligence

How MiniMax’s Linear‑Attention Architecture Is Redefining Long‑Context AI Models

MiniMax’s rapid 2025 releases—including a video model, open‑source LLM, and high‑fidelity voice model—showcase its multimodal linear‑attention architecture that handles up to 4 million tokens, earns a16z recognition, and signals China’s growing influence in open‑source AI innovation.

Artificial IntelligenceLarge Language ModelsLinear Attention

0 likes · 8 min read

How MiniMax’s Linear‑Attention Architecture Is Redefining Long‑Context AI Models

AI Frontier Lectures

Mar 17, 2025 · Artificial Intelligence

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

The article analyzes Mercury Coder, a diffusion‑based language model that generates text and code in parallel, compares its speed and quality against traditional autoregressive LLMs like GPT‑4o‑mini using a ball‑collision benchmark, and discusses the underlying score‑entropy training, current limitations, and future multimodal potential.

AI performanceLarge Language ModelsMercury

0 likes · 8 min read

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

ZhongAn Tech Team

Mar 17, 2025 · Artificial Intelligence

Weekly Tech Digest: AI Model Advancements, Strategic Infrastructure Deals, and Industry Insights on AI Agents

This weekly technology digest highlights significant advancements in artificial intelligence, including OpenAI's Python-enabled o1 model, Google's open-source Gemma 3, and Alibaba's AI-driven Quark application, alongside major industry partnerships, expert forecasts on AI agent proliferation, and emerging developments in robotics and wearable technology.

AI agentsArtificial IntelligenceLarge Language Models

0 likes · 7 min read

Weekly Tech Digest: AI Model Advancements, Strategic Infrastructure Deals, and Industry Insights on AI Agents

Alibaba Cloud Developer

Mar 17, 2025 · Artificial Intelligence

23 Proven Prompt Engineering Techniques to Make AI Understand You Instantly

As large language models become increasingly adept at natural language, mastering prompt engineering remains essential; this article compiles 23 practical strategies—from concise commands and role‑playing to structured formatting and output guidance—that empower users to communicate clearly with AI and obtain high‑quality, targeted results.

AI interactionAI productivityLarge Language Models

0 likes · 18 min read

23 Proven Prompt Engineering Techniques to Make AI Understand You Instantly

Fighter's World

Mar 14, 2025 · Industry Insights

Will the 10× Growth Promise of Vertical AI Crumble as Generalist LLMs Like Manus Dominate the Market?

The article examines whether the surge of general‑purpose large language models such as Manus, Claude Sonet, and Qwen undermines the Bessemer Venture Partners claim that Vertical AI will grow tenfold, by analysing market size, use‑case demand, technical challenges, emerging business models, and competitive moats.

AI agentsAI marketBusiness Models

0 likes · 19 min read

Will the 10× Growth Promise of Vertical AI Crumble as Generalist LLMs Like Manus Dominate the Market?

Zhihu Tech Column

Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu’s ZhiLight Large Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu’s technical talk on the ZhiLight large‑model inference framework, detailing model execution mechanisms, GPU load analysis, multi‑GPU parallel strategies, open‑source engine comparisons, compute‑communication overlap, quantization techniques, benchmark results, and future directions for scalable LLM deployment.

GPU parallelismLarge Language ModelsSGLang

0 likes · 11 min read

Insights from Zhihu’s ZhiLight Large Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

Alimama Tech

Mar 14, 2025 · Artificial Intelligence

Advances in Search Advertising Models with Large Language Models (2024)

In 2024 Alibaba Mama outlines how large‑language models transform search advertising through a three‑line scaling roadmap—explicit inductive‑bias design, implicit compute growth, and auxiliary CV/NLP advances—implemented via a pre‑train/post‑train/CTR paradigm and the LUM user‑behavior model, promising gains in relevance, recall, and real‑time serving while highlighting inference efficiency challenges.

CTR PredictionLarge Language ModelsScaling Law

0 likes · 25 min read

Advances in Search Advertising Models with Large Language Models (2024)

Baidu Tech Salon

Mar 13, 2025 · Artificial Intelligence

How PaddlePaddle 3.0 Boosts Large‑Model Inference with 4‑Bit Quantization and MLA Optimizations

PaddlePaddle 3.0 introduces a full‑stack inference engine that supports FP8, INT8, and 4‑bit quantization for popular LLMs such as DeepSeek V3/R1, delivers up to 2× token throughput on a single H800 GPU, and provides detailed deployment scripts for single‑node and multi‑node setups, including MTP speculative decoding and SageAttention for long‑sequence acceleration.

DockerInference OptimizationLarge Language Models

0 likes · 13 min read

How PaddlePaddle 3.0 Boosts Large‑Model Inference with 4‑Bit Quantization and MLA Optimizations

Alibaba Cloud Big Data AI Platform

Mar 13, 2025 · Artificial Intelligence

From Chain‑of‑Thought to Self‑Evolving Agents: Lessons from AI Agent Engineering

This article traces the evolution of large‑model agents from a simple chain‑of‑thought design through tool and agent instantiation, structured PEER patterns, and self‑evolving architectures, highlighting practical challenges, middleware solutions, and open‑source resources for building robust AI agents.

AI agentsLarge Language ModelsMiddleware

0 likes · 16 min read

From Chain‑of‑Thought to Self‑Evolving Agents: Lessons from AI Agent Engineering

Architects' Tech Alliance

Mar 11, 2025 · Artificial Intelligence

How DeepSeek’s Breakthrough AI Models Thrive on Huawei Ascend: A Deep Dive

An in‑depth analysis reveals how DeepSeek’s V3 and R1 large‑language models achieve superior inference performance and cost efficiency on Huawei’s Ascend AI platform, detailing architectural optimizations, KV‑Cache reductions, multimodal support, real‑world deployments across finance, government, manufacturing, and the projected impact on the AI industry.

DeepSeekHuawei AscendLarge Language Models

0 likes · 4 min read

How DeepSeek’s Breakthrough AI Models Thrive on Huawei Ascend: A Deep Dive

AI Algorithm Path

Mar 11, 2025 · Artificial Intelligence

AI Agents Overview: Foundations, Core Components, and When to Use Them

This article provides a comprehensive overview of AI Agents, tracing their evolution from traditional chatbots to LLM‑driven agents, explaining core components such as perception, reasoning, action, knowledge bases, learning and communication interfaces, and discussing practical use cases, interaction cycles, and future prospects.

AI agentsAutonomous SystemsLarge Language Models

0 likes · 15 min read

AI Agents Overview: Foundations, Core Components, and When to Use Them

58 Tech

Mar 11, 2025 · Artificial Intelligence

Applying Large Language Models to Real Estate Recommendation: Case Studies and Optimization Techniques

This article presents a comprehensive case study on how large language models are integrated into 58.com’s real‑estate recommendation platform, detailing challenges, data adaptation, prompt and parameter optimizations, embedding generation, conversational recommendation, and future directions for multimodal and generative recommendation systems.

EmbeddingLarge Language ModelsPrompt Engineering

0 likes · 14 min read

Applying Large Language Models to Real Estate Recommendation: Case Studies and Optimization Techniques

Efficient Ops

Mar 9, 2025 · Artificial Intelligence

Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models

LLMOps, the end-to-end methodology for managing large language models, encompasses a curated set of development, deployment, monitoring, and local management tools—such as LangChain, vLLM, LangSmith, and Ollama—enabling practitioners to efficiently build, scale, and maintain AI applications.

LLMOpsLarge Language ModelsModel Deployment

0 likes · 6 min read

Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models

Architects' Tech Alliance

Mar 9, 2025 · Industry Insights

DeepSeek’s AI Ecosystem: From Core Tech to Market Impact

This article provides a comprehensive analysis of DeepSeek, covering its foundational AI research, technology stack, product offerings, and the broader upstream, midstream, and downstream AI industry landscape, including hardware, server, cloud, and market trends.

AI InfrastructureArtificial IntelligenceDeepSeek

0 likes · 13 min read

DeepSeek’s AI Ecosystem: From Core Tech to Market Impact

Fun with Large Models

Mar 8, 2025 · Artificial Intelligence

Make AI Obey: A Detailed Prompt Engineering Guide to Boost Large‑Model Logic

This tutorial explains how to enhance large language models' logical reasoning by using DeepSeek‑R1's deep‑thinking mode, few‑shot prompting, chain‑of‑thought, and zero‑shot chain‑of‑thought techniques, providing concrete examples, comparisons, and a step‑by‑step template for effective prompt design.

AI reasoningChain-of-ThoughtDeepSeek

0 likes · 10 min read

Make AI Obey: A Detailed Prompt Engineering Guide to Boost Large‑Model Logic

Code Mala Tang

Mar 8, 2025 · Artificial Intelligence

14 Powerful Prompt Engineering Techniques to Unlock AI’s Full Potential

This article introduces the fundamentals of prompt engineering and presents fourteen practical techniques—ranging from role‑playing and step‑by‑step reasoning to chain‑of‑thought and ReAct—that help users craft precise, high‑quality prompts for any large language model, dramatically improving AI output.

AIAI productivityLLM techniques

0 likes · 16 min read

14 Powerful Prompt Engineering Techniques to Unlock AI’s Full Potential

Cognitive Technology Team

Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingLarge Language Models

0 likes · 22 min read

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

Alibaba Cloud Big Data AI Platform

Mar 7, 2025 · Artificial Intelligence

How Pai‑Megatron‑Patch Boosts Qwen2‑VL Multimodal Training Efficiency

This article explains how the Pai‑Megatron‑Patch toolkit enhances the usability and training performance of the Qwen2‑VL multimodal large model by introducing model‑parallel weight conversion, user‑friendly data loading, visual feature processing optimizations, optimizer offloading, and pipeline parallelism techniques, supported by extensive experimental analysis.

Large Language ModelsMegatronQwen2-VL

0 likes · 25 min read

How Pai‑Megatron‑Patch Boosts Qwen2‑VL Multimodal Training Efficiency

dbaplus Community

Mar 7, 2025 · Artificial Intelligence

Master Prompt Engineering: Frameworks, Strategies, and Real‑World Examples for Large Language Models

This comprehensive guide explains what prompts are, outlines essential prompt components and multiple engineering frameworks, presents practical strategies for crafting clear and structured prompts, addresses model limitations such as hallucinations, and showcases a wide range of advanced prompting techniques with code examples.

AILLMLarge Language Models

0 likes · 29 min read

Master Prompt Engineering: Frameworks, Strategies, and Real‑World Examples for Large Language Models

Data Thinking Notes

Mar 6, 2025 · Artificial Intelligence

How China’s State‑Owned Giants Are Accelerating AI with DeepSeek

Amid a global digital surge, 45% of China’s central state‑owned enterprises have deployed the DeepSeek large‑model platform, rapidly integrating AI across energy, power, telecom, construction and other sectors to boost intelligent transformation and operational efficiency.

AI adoptionChinaDeepSeek

0 likes · 7 min read

How China’s State‑Owned Giants Are Accelerating AI with DeepSeek

MaGe Linux Operations

Mar 6, 2025 · Operations

How Large Language Models Are Revolutionizing SRE from Firefighting to Proactive Ops

This article explores how open‑source large language models like DeepSeek empower SRE teams to shift from reactive firefighting to proactive, predictive operations, detailing technical principles, real‑world case studies, essential skill sets, and future trends that reshape the operations landscape.

AI OpsLarge Language ModelsObservability

0 likes · 8 min read

How Large Language Models Are Revolutionizing SRE from Firefighting to Proactive Ops

JD Retail Technology

Mar 6, 2025 · Artificial Intelligence

Dynamic Margin Selection for Efficient Deep Learning and Low-Resource Large Model Training

Jia Xing’s research introduces Dynamic Margin Selection, a technique that repeatedly refreshes a core set of boundary‑close samples to train large language models efficiently on limited resources, achieving comparable loss to full‑data training, enabling six‑fold model compression, faster inference, and a proposed exponential scaling law for data‑efficient AI.

ICLRLarge Language ModelsLow-Resource Training

0 likes · 10 min read

Dynamic Margin Selection for Efficient Deep Learning and Low-Resource Large Model Training

Tencent Technical Engineering

Mar 5, 2025 · Information Security

Detecting Critical AI Infrastructure Vulnerabilities with AI-Infra-Guard

As open‑source large language model tools like Ollama, OpenWebUI and ComfyUI gain popularity, numerous security flaws such as unauthenticated APIs, CVE‑exploits, model theft and remote code execution emerge, prompting the development of AI‑Infra‑Guard—a lightweight, cross‑platform scanner that identifies over 30 component vulnerabilities and offers both web UI and CLI modes for rapid risk assessment.

AI securityAI-Infra-GuardCVE

0 likes · 13 min read

Detecting Critical AI Infrastructure Vulnerabilities with AI-Infra-Guard

Architects' Tech Alliance

Mar 5, 2025 · Industry Insights

DeepSeek R1 & Kimi 1.5: Inside the Development of Near‑Strong Reasoning Models

The article analyzes DeepSeek's recent releases—V3 dialogue model and R1 inference model—detailing their launch dates, rapid popularity surge, R1's reinforcement‑learning‑based design for code and math tasks, and provides links to related Beijing University technical reports while stripping promotional sales content.

AIDeepSeekIndustry Analysis

0 likes · 3 min read

DeepSeek R1 & Kimi 1.5: Inside the Development of Near‑Strong Reasoning Models

AntTech

Mar 4, 2025 · Artificial Intelligence

GraphCLIP and 2D‑TPE: Enhancing Transferability of Graph Models and Table Understanding for Large Language Models

This article introduces GraphCLIP, a self‑supervised graph‑summary pre‑training framework that boosts zero‑ and few‑shot transferability of graph foundation models for text‑attributed graphs, and 2D‑TPE, a two‑dimensional positional encoding method that preserves table structure to markedly improve large language model performance on table‑understanding tasks, while also announcing a live paper session at WWW 2025 featuring the authors.

Graph Neural NetworksLarge Language ModelsPositional Encoding

0 likes · 6 min read

GraphCLIP and 2D‑TPE: Enhancing Transferability of Graph Models and Table Understanding for Large Language Models

JD Retail Technology

Feb 28, 2025 · Artificial Intelligence

Generative Recommendation with DPO Alignment for JD Alliance Advertising: Multi‑Objective Optimization and Online Results

The paper presents a generative recommendation framework for JD Alliance advertising that combines semantic‑ID modeling, large‑model pre‑training and fine‑tuning, and Direct Preference Optimization (including Softmax‑DPO and β‑DPO) to jointly boost click‑through and conversion rates, achieving +0.6% UCTR and +8% UCVR in online tests while outlining future multi‑objective extensions.

AdvertisingDPOLarge Language Models

0 likes · 12 min read

Generative Recommendation with DPO Alignment for JD Alliance Advertising: Multi‑Objective Optimization and Online Results

Architect

Feb 27, 2025 · Artificial Intelligence

Understanding Inference Large Language Models: DeepSeek‑R1 and the Rise of Test‑Time Computation

This article explains how inference‑oriented large language models such as DeepSeek‑R1 and OpenAI o1‑mini shift AI research from training‑time scaling to test‑time computation, detailing the underlying principles, new scaling laws, verification techniques, reinforcement‑learning pipelines, and practical methods for distilling reasoning capabilities into smaller models.

DeepSeek-R1Large Language Modelsinference

0 likes · 18 min read

Understanding Inference Large Language Models: DeepSeek‑R1 and the Rise of Test‑Time Computation

Code Mala Tang

Feb 27, 2025 · Artificial Intelligence

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

AI reasoningAI safetyChain-of-Thought

0 likes · 14 min read

Do New AI Reasoning Models Really Think? Unpacking the Debate

DataFunSummit

Feb 26, 2025 · Artificial Intelligence

Applying Multimodal Large Models to Music Recommendation at NetEase Cloud Music

This article details how NetEase Cloud Music leverages multimodal large language models to improve music recommendation across daily, personalized, and playlist scenarios by extracting rich audio, text, and visual features, addressing data skew, cold‑start challenges, and achieving measurable gains in user engagement and distribution efficiency.

Large Language ModelsMultimodal AINetEase Cloud Music

0 likes · 12 min read

Applying Multimodal Large Models to Music Recommendation at NetEase Cloud Music

AntTech

Feb 26, 2025 · Artificial Intelligence

Ant Group’s 18 Accepted Papers at AAAI 2025: Summaries and Highlights

This article presents concise English summaries of the 18 Ant Group papers accepted at AAAI 2025, covering topics such as privacy‑preserving large‑model tuning, knowledge‑graph integration, AI‑generated image detection, multi‑task learning, generative retrieval, role‑playing evaluation, and video hallucination mitigation.

AAAI 2025AI evaluationGenerative Retrieval

0 likes · 29 min read

Ant Group’s 18 Accepted Papers at AAAI 2025: Summaries and Highlights

Ops Development & AI Practice

Feb 25, 2025 · Artificial Intelligence

What Is Hybrid Reasoning in Claude 3.7 Sonnet and Why It Matters

Hybrid reasoning lets Claude 3.7 Sonnet dynamically switch between fast, intuition‑like answers and step‑by‑step, deep analysis, improving both speed and accuracy for tasks ranging from simple code snippets to complex algorithm design, and signals a broader shift in large language model capabilities.

AI reasoningClaude 3.7Hybrid Reasoning

0 likes · 9 min read

What Is Hybrid Reasoning in Claude 3.7 Sonnet and Why It Matters

21CTO

Feb 25, 2025 · Artificial Intelligence

How Alibaba’s Qwen 2.5‑Max Challenges GPT‑4o and Redefines China’s AI Race

Chinese tech giants Huawei and Alibaba respond to President Xi’s call for stronger innovation, with Huawei showcasing its HarmonyOS and server‑grade Arm processor while Alibaba unveils the Qwen 2.5‑Max large language model that outperforms leading Western AI systems on multiple benchmarks, highlighting China’s accelerating AI ambitions.

AIAlibabaChina

0 likes · 5 min read

How Alibaba’s Qwen 2.5‑Max Challenges GPT‑4o and Redefines China’s AI Race

Architecture Digest

Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekKnowledge Distillation

0 likes · 16 min read

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

21CTO

Feb 24, 2025 · Artificial Intelligence

From Transformers to DeepSeek-R1: Evolution of Large Language Models

Since the 2017 introduction of the Transformer architecture, this article chronicles the rapid development of large language models—including BERT, GPT series, multimodal systems, and the cost‑effective DeepSeek‑R1—highlighting key innovations, scaling trends, alignment techniques, and their transformative impact across AI research and industry.

AI evolutionDeepSeekLLM History

0 likes · 23 min read

From Transformers to DeepSeek-R1: Evolution of Large Language Models

Architects' Tech Alliance

Feb 24, 2025 · Artificial Intelligence

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

The NSA mechanism introduces a three‑branch hardware‑optimized sparse attention architecture—token compression, token selection, and sliding window—combined with learnable gating to balance global and local context, dramatically improving inference speed and efficiency for long‑context large language models.

AI ArchitectureDeepSeekLarge Language Models

0 likes · 5 min read

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

Software Engineering 3.0 Era

Feb 23, 2025 · Artificial Intelligence

2024 AI Programming: Key Advances, Tools, and Trends

The article reviews 2024 AI programming progress, covering the rise of AI code editors like Cursor, the debut of the AI programmer Devin, rapid improvements in SWE‑bench success rates, enhancements in model architecture, multimodal agents, tool‑integration frameworks, adoption statistics in China and abroad, and future directions for collaborative AI‑driven software development.

AI agentsAI programmingLarge Language Models

0 likes · 10 min read

2024 AI Programming: Key Advances, Tools, and Trends

Su San Talks Tech

Feb 23, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks AI Model Limits: Core Principles & Performance

This article explores DeepSeek’s cutting‑edge distillation technology, detailing its definition, underlying principles, innovative data‑model fusion, architecture choices, training strategies, performance gains over large language models, and the remaining challenges in knowledge transfer and multimodal data processing.

DeepSeekKnowledge DistillationLarge Language Models

0 likes · 16 min read

How DeepSeek’s Distillation Breaks AI Model Limits: Core Principles & Performance

Model Perspective

Feb 22, 2025 · Artificial Intelligence

Why DeepSeek Is Gaining Traction Beyond ChatGPT: Insights from the Global Developers Conference

The article examines DeepSeek’s surge in popularity by analyzing its timely release, cost‑effective performance, open‑source approach, and broader AI ecosystem trends, while also sharing expert predictions and practical coding tool recommendations for developers.

AI predictionsAI trendsDeepSeek

0 likes · 5 min read

Why DeepSeek Is Gaining Traction Beyond ChatGPT: Insights from the Global Developers Conference

Software Engineering 3.0 Era

Feb 19, 2025 · Artificial Intelligence

Three Breakthroughs in AI Inference Models: 1% Data for 99% Performance and More

The article reviews three recent AI inference model advances—open‑source models surpassing OpenAI, the LIMO approach that gains 99% performance with just 1% of the data, and the CoAT framework that combines Monte‑Carlo tree search with associative memory to enable iterative, self‑correcting reasoning.

AI inferenceBenchmarkingCoAT

0 likes · 7 min read

Three Breakthroughs in AI Inference Models: 1% Data for 99% Performance and More

Architect

Feb 19, 2025 · Artificial Intelligence

Does Scaling Law Still Hold for Grok 3? A Deep Dive into LLM Training Economics

The article critically examines whether the pre‑training Scaling Law still applies to Grok 3, compares its compute usage and model size with DeepSeek and OpenAI models, evaluates the cost‑effectiveness of pre‑training, RL and test‑time scaling, and explores how these insights shape future large‑language‑model development strategies.

Grok 3Large Language ModelsModel Efficiency

0 likes · 11 min read

Does Scaling Law Still Hold for Grok 3? A Deep Dive into LLM Training Economics

AI Algorithm Path

Feb 19, 2025 · Artificial Intelligence

How Temperature Shapes Output in Large Language Models

The article explains the Temperature hyper‑parameter in large language models, shows how it modifies the softmax distribution, provides a Python visualisation script, and demonstrates through experiments that higher values increase creativity while lower values make outputs more deterministic.

Large Language ModelsPythonTemperature

0 likes · 5 min read

How Temperature Shapes Output in Large Language Models

Alibaba Cloud Developer

Feb 19, 2025 · Artificial Intelligence

How to Replicate DeepSeek‑R1’s Thought Process on Claude 3.5 Sonnet with Prompt Engineering

The article explains how to use prompt‑engineering techniques on Claude 3.5 Sonnet to mimic DeepSeek‑R1’s transparent reasoning, detailing background, prompt design, iterative optimization, and the broader impact on AI communication and user expression.

AI reasoningClaudeDeepSeek

0 likes · 25 min read

How to Replicate DeepSeek‑R1’s Thought Process on Claude 3.5 Sonnet with Prompt Engineering

Code Mala Tang

Feb 19, 2025 · Artificial Intelligence

Compute Power’s Role in the AI Race: Insights from Grok 3, DeepSeek & the Post‑Training Era

The article analyzes how massive compute resources drive AI breakthroughs, highlighting Grok 3's top‑tier performance, DeepSeek's efficient engineering under constraints, and the emerging post‑training paradigm that reshapes competition among major AI players.

AI scalingDeepSeekGrok 3

0 likes · 7 min read

Compute Power’s Role in the AI Race: Insights from Grok 3, DeepSeek & the Post‑Training Era