Tagged articles

130 articles

Page 1 of 2

May 11, 2026 · Artificial Intelligence

Train a 64M LLM from Scratch in 2 Hours for $3 and Master LLM Systems

This article introduces two open‑source projects—MiniMind, which lets you train a 64M‑parameter LLM in about two hours for under $3, and Happy‑LLM, a systematic tutorial that explains LLM theory and practice—detailing their features, training pipelines, benchmarks, data, and how they complement each other for comprehensive LLM learning.

AIBenchmarkHappy-LLM

0 likes · 7 min read

Train a 64M LLM from Scratch in 2 Hours for $3 and Master LLM Systems

Machine Heart

Apr 30, 2026 · Artificial Intelligence

Beyond DeepSeek V4: A Trillion‑Parameter LLM Trained End‑to‑End on Domestic Chips

The article analyzes how both DeepSeek V4 and Meituan's LongCat‑2.0‑P preview, each with trillion‑scale parameters and 1 M‑token context, were trained and inferred entirely on Chinese‑made accelerators, detailing memory optimizations, deterministic operators, MoE redesigns, and massive multi‑card clusters that prove domestic compute can meet top‑tier AI workloads.

Deterministic OpsDomestic AI ChipLongCat

0 likes · 13 min read

Beyond DeepSeek V4: A Trillion‑Parameter LLM Trained End‑to‑End on Domestic Chips

PMTalk Product Manager Community

Apr 30, 2026 · Artificial Intelligence

How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager

The article walks through model training, validation, ensemble learning, and deployment from an AI product manager’s viewpoint, using a churn‑prediction case to illustrate decision boundaries, metric choices, industry‑specific algorithm trade‑offs, cost considerations, and practical serving options.

AI product managementLarge ModelModel Deployment

0 likes · 6 min read

How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager

Machine Heart

Apr 18, 2026 · Artificial Intelligence

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Paxini, together with JD Cloud, Tencent Cloud, and Baidu Intelligent Cloud, launches the world’s first hundred‑billion‑scale, full‑modal, high‑degree‑of‑freedom embodied AI data cloud mall, offering instant online data procurement, end‑to‑end model training pipelines, and validated performance gains in both lab and real‑world robot tasks.

Embodied AIModel TrainingMultimodal Data

0 likes · 13 min read

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

AI Large-Model Wave and Transformation Guide

Apr 11, 2026 · Artificial Intelligence

How to Build a Full‑Cycle Model Engineering System for Scalable AI

This article outlines a comprehensive, six‑part model engineering framework that transforms AI capabilities into reusable business functions, defines a stable technical stack, establishes model selection and architecture guidelines, implements rigorous control, data, and training processes, and explains how these layers synergize for reliable, scalable deployment.

AI deploymentModel TrainingOperations

0 likes · 27 min read

How to Build a Full‑Cycle Model Engineering System for Scalable AI

AI Tech Publishing

Apr 8, 2026 · Artificial Intelligence

How Model, Harness, and Memory Enable Continual Learning for AI Agents

The article breaks down AI agent continual learning into three layers—model, harness, and context—explains their distinct challenges, shows how traces link them, and argues that focusing on harness and context yields faster, more practical improvements than merely retraining models.

AI agentsModel Trainingcontext memory

0 likes · 9 min read

How Model, Harness, and Memory Enable Continual Learning for AI Agents

PaperAgent

Mar 29, 2026 · Industry Insights

From Reasoning to Agentic Thinking: How Harnesses Are Redefining AI Development

The article examines the shift from traditional reasoning‑based large‑language‑model pipelines to agentic, harness‑driven AI systems, outlining the definition of a harness, its engineering challenges, architectural components, and the broader implications for training, reinforcement learning, and future research directions.

AI HarnessInfrastructureIntelligent agents

0 likes · 16 min read

From Reasoning to Agentic Thinking: How Harnesses Are Redefining AI Development

AI Explorer

Mar 23, 2026 · Artificial Intelligence

How Unsloth Studio Turns Local AI Training into a Simple, High‑Performance Experience

Unsloth Studio, an open‑source local AI studio, combines a sleek web UI with a custom Triton kernel that claims up to 2× faster training, 70% VRAM savings (80% for RL), supports over 500 models, visual data‑recipe workflows, and both desktop and Python library usage for developers, researchers, and hobbyists.

AI StudioLocal AIModel Training

0 likes · 7 min read

How Unsloth Studio Turns Local AI Training into a Simple, High‑Performance Experience

Old Zhang's AI Learning

Mar 22, 2026 · Artificial Intelligence

Hands‑On Review: Unsloth Studio’s One‑Stop Local LLM Console (Windows‑Ready)

The author tests Unsloth Studio, a local web UI that unifies model download, execution, dataset handling, training, fine‑tuning and export, supporting GGUF and safetensors formats across Windows, macOS and Linux, and highlights its integrated tool‑calling, data‑recipe workflow, observability features, installation quirks, and target user scenarios.

GGUFLocal-LLMModel Training

0 likes · 9 min read

Hands‑On Review: Unsloth Studio’s One‑Stop Local LLM Console (Windows‑Ready)

AI Info Trend

Mar 19, 2026 · Industry Insights

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

In February 2026, China introduced a pioneering group standard that defines executable acceptance rules for AI training datasets, linking data delivery, quality assessment, and model training through a three‑layer framework, quantitative metrics, and a pre‑negotiated quality baseline to reduce disputes and costs.

AIData AcceptanceData Quality

0 likes · 7 min read

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

Code Mala Tang

Mar 5, 2026 · Artificial Intelligence

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

This tutorial walks readers through the fundamentals of YOLOv12, covering model variants, dataset preparation with Roboflow, optional FlashAttention acceleration, installation, model selection, training commands, post‑training tasks such as tracking, validation, inference, exporting to ONNX, and benchmarking, all with concrete code snippets and practical tips.

Computer VisionFlashAttentionModel Training

0 likes · 8 min read

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

Wuming AI

Mar 2, 2026 · Industry Insights

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

The article explains how the newly released "AI Training Data Set Delivery and Quality Acceptance Specification" addresses gaps in existing data‑quality standards by defining a three‑layer acceptance framework, quantitative metrics, and a pre‑negotiated quality‑baseline mechanism to make dataset delivery verifiable and directly supportive of model training goals.

AI data standardsData GovernanceData Quality

0 likes · 7 min read

Machine Learning Algorithms & Natural Language Processing

Feb 23, 2026 · Artificial Intelligence

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

This article systematically dissects the engineering decisions behind frontier large‑language‑model training—covering architecture choices, attention variants, optimizer evolution, data‑curation strategies, scaling‑law insights, and post‑training SFT/RL pipelines—based on open‑source reports from seven leading AI laboratories.

Mixture of ExpertsModel Traininglarge language models

0 likes · 26 min read

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

AI Cyberspace

Jan 18, 2026 · Artificial Intelligence

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

The article explains various learning paradigms (supervised, unsupervised, self‑supervised, semi‑supervised, and reinforcement), describes dataset types and quality considerations, outlines preprocessing steps like filtering, deduplication, and tokenization, and discusses scaling laws linking model size, data volume, and compute resources, with concrete examples and code.

Model Trainingdata preprocessingmachine learning

0 likes · 26 min read

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

Data Party THU

Dec 20, 2025 · Artificial Intelligence

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

This guide walks you through 20 fundamental PyTorch concepts—including tensor creation, operations, autograd, model building, data loading, GPU acceleration, and best‑practice tricks—providing clear code snippets and step‑by‑step explanations so you can quickly prototype, train, and deploy neural networks.

Deep LearningGPU AccelerationModel Training

0 likes · 16 min read

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

Alimama Tech

Oct 22, 2025 · Artificial Intelligence

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

This article details Alibaba’s Taobao Star fashion AIGC model, explaining its data pipeline, captioning strategy, multi‑stage training, and impressive virtual try‑on results for users and merchants, while showcasing model‑based and model‑free generation and pose‑transfer capabilities.

AIAIGCComputer Vision

0 likes · 11 min read

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

DataFunSummit

Oct 5, 2025 · Artificial Intelligence

How Baidu’s AI‑Powered Code Assistant Is Revolutionizing Software Development

In this detailed presentation, Baidu’s engineering manager Yang Jingwei explains the current landscape, emerging trends, key challenges, data pipelines, model training, prompt engineering, multi‑platform support, and future outlook of Baidu’s intelligent code assistant and AI IDE, illustrating practical solutions and real‑world impact.

AI code assistantModel TrainingPrompt engineering

0 likes · 26 min read

How Baidu’s AI‑Powered Code Assistant Is Revolutionizing Software Development

Volcano Engine Developer Services

Sep 28, 2025 · Artificial Intelligence

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

This guide breaks down the complex terminology of large language models—explaining tokens, transformers, self‑attention, RAG, scaling laws, dense vs. sparse architectures, and training stages—using clear analogies and step‑by‑step explanations so readers can confidently understand and work with modern AI systems.

AI fundamentalsModel TrainingRAG

0 likes · 35 min read

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

Data Party THU

Sep 18, 2025 · Artificial Intelligence

How DeepSeek‑R1’s Reinforcement Learning Redefined LLM Reasoning (Nature Cover Story)

DeepSeek‑R1, the first peer‑reviewed large language model, landed on Nature’s cover after a novel reinforcement‑learning‑only training pipeline that dramatically boosted reasoning performance while keeping training costs surprisingly low.

DeepSeekGRPOModel Training

0 likes · 14 min read

How DeepSeek‑R1’s Reinforcement Learning Redefined LLM Reasoning (Nature Cover Story)

Architecture & Thinking

Sep 17, 2025 · Artificial Intelligence

How the 32B ‘Zhiyu’ Model is Revolutionizing Intelligent Operations

The Zhiyu model, a 32‑billion‑parameter SRE‑focused LLM, combines extensive domain knowledge, enhanced professional skills, and deterministic RAG to deliver precise, actionable insights for intelligent operations, backed by a robust multi‑source training pipeline, staged training, and flexible deployment options.

AI OperationsModel TrainingRAG

0 likes · 7 min read

How the 32B ‘Zhiyu’ Model is Revolutionizing Intelligent Operations

AI Algorithm Path

Sep 8, 2025 · Artificial Intelligence

Understanding MolmoAct: The Next‑Generation Large Action Model for Robotics

This article analyzes the MolmoAct large action model, detailing its three‑stage perception‑planning‑control architecture, novel depth‑aware tokenization, extensive pre‑training and fine‑tuning pipelines, and benchmark results that demonstrate superior efficiency and generalization over prior vision‑language‑action systems.

Model TrainingMolmoActRobotics

0 likes · 12 min read

Understanding MolmoAct: The Next‑Generation Large Action Model for Robotics

DataFunSummit

Aug 28, 2025 · Artificial Intelligence

Why Finance Needs Its Own Large Language Model: Insights from Du Xiaoman

This article explains how the unique data‑driven, knowledge‑intensive, and complex nature of the financial industry makes large language models especially valuable, outlines the limitations of generic models, and shows how domain‑specific, cost‑effective models can deliver superior performance for finance.

AIModel Trainingcost efficiency

0 likes · 5 min read

Why Finance Needs Its Own Large Language Model: Insights from Du Xiaoman

Zhuanzhuan Tech

Aug 15, 2025 · Artificial Intelligence

How AI-Powered Minos Transforms Customer Service Quality Inspection

Facing massive daily customer service data, ZhiZhi built the AI-driven Minos quality inspection system, combining inspection items, plans, and tasks with large models, regex and programmatic checks, achieving a 26‑fold detection boost and processing over 20,000 interactions per day.

AI Quality InspectionCustomer Service AutomationModel Training

0 likes · 10 min read

How AI-Powered Minos Transforms Customer Service Quality Inspection

Bilibili Tech

Aug 12, 2025 · Artificial Intelligence

How Bilibili Scaled AI Model Training with Alluxio Cache Acceleration

This article details Bilibili's multi-layer storage architecture and Alluxio‑based cache acceleration for large‑scale AI model training, covering challenges of high‑throughput, low‑latency file access, metadata scalability, fault tolerance, and the engineering solutions that boosted I/O performance up to ten‑fold.

AIAlluxioModel Training

0 likes · 24 min read

How Bilibili Scaled AI Model Training with Alluxio Cache Acceleration

AntTech

Aug 1, 2025 · Artificial Intelligence

How Ant Group Dominated the 2025 DCASE Audio Question Answering Challenge

The article details the 2025 DCASE Audio Question Answering (AQA) track, outlines its technical challenges, describes Ant Group's three‑stage data, model, and training pipeline, presents performance gains of their Qwen2‑Audio‑R1‑8B and Kimi‑Audio‑SFT‑12B models, and outlines future research directions.

Audio Question AnsweringDCASEModel Training

0 likes · 8 min read

How Ant Group Dominated the 2025 DCASE Audio Question Answering Challenge

Alibaba Cloud Developer

Jul 24, 2025 · Artificial Intelligence

Optimizing Small Perception Models on Different Compute Cards for Autonomous Driving

This article shares practical experience training perception‑detection mini‑models on two different compute cards, covering environment setup, technical architecture, common dependency issues, performance‑boosting tricks such as CPU process pools, torch dataloader tuning, NCCL P2P handling, and CPFS storage optimization.

Distributed TrainingModel TrainingPerformance Optimization

0 likes · 17 min read

Optimizing Small Perception Models on Different Compute Cards for Autonomous Driving

Tencent Technical Engineering

Jul 18, 2025 · Artificial Intelligence

From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure

This article explores the evolution of AI infrastructure, comparing it with traditional backend systems, and details how hardware shifts to GPU-centric designs, software adaptations like deep learning frameworks, and engineering challenges in model training and inference can be addressed using established backend methodologies.

AI InfrastructureDeep LearningGPU computing

0 likes · 19 min read

From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure

Tencent Cloud Developer

Jul 17, 2025 · Artificial Intelligence

Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges

This article explores how AI infrastructure has shifted from CPU‑centric designs to GPU‑driven architectures, detailing hardware evolution, software changes, and the engineering challenges of large‑model training and inference, while offering practical insights for traditional backend engineers transitioning to AI systems.

AI InfrastructureDeep LearningGPU computing

0 likes · 16 min read

Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges

ELab Team

Jul 9, 2025 · Artificial Intelligence

How Fast‑Apply AI Models Revolutionize Code Editing with Speculative Decoding

This article explains the design of the edit_file tool, the fast‑apply model that rewrites whole files instead of diffs, its training and evaluation methodology, speculative decoding speed gains, and future research directions for large‑scale code‑editing AI systems.

AIModel Trainingcode editing

0 likes · 14 min read

How Fast‑Apply AI Models Revolutionize Code Editing with Speculative Decoding

Alibaba Cloud Big Data AI Platform

Jun 30, 2025 · Artificial Intelligence

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

This article introduces a variable‑length chain‑of‑thought distillation technique built on Alibaba Cloud PAI’s EasyDistill toolkit, presents the high‑quality OmniThought‑0528 dataset, details the training of the DistillQwen‑ThoughtY 4B/8B/32B models, and provides code and usage examples for researchers and practitioners.

DatasetDistillationLLM

0 likes · 15 min read

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

DataFunTalk

Jun 3, 2025 · Artificial Intelligence

Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning

Researchers from NUS, Tsinghua and Salesforce AI Research introduce a meta‑capability alignment framework that integrates deductive, inductive and abductive reasoning via a psychology‑based triple, automatically generates and validates training data, and demonstrates over 10% accuracy gains on math, coding and scientific benchmarks for 7B and 32B models.

Meta‑Capability AlignmentModel Traininglarge language models

0 likes · 8 min read

Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning

Alibaba Cloud Developer

May 26, 2025 · Artificial Intelligence

How Multi‑Agent Planning Boosts Copilot 3.0 with DeepSeek R1 GRPO Training

This article examines Copilot 3.0’s planning module, explains how DeepSeek R1’s GRPO reinforcement‑learning pipeline enables flexible multi‑agent orchestration, addresses the limitations of Copilot 2.0, and presents experimental results that show a 61% reduction in reasoning length and a 9% relative gain in accuracy.

AIModel TrainingMulti-Agent

0 likes · 14 min read

How Multi‑Agent Planning Boosts Copilot 3.0 with DeepSeek R1 GRPO Training

Baobao Algorithm Notes

May 2, 2025 · Artificial Intelligence

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

This article analyzes whether reinforcement learning enhances large language model reasoning, compares findings from DeepSeek-Math, a Tsinghua‑Shanghai Jiao‑Tong paper, and Qwen3, and outlines practical training pipelines—including Seed‑Thinking‑v1.5, DeepSeek‑R1, Kimi‑K1.5, and Qwen3—that aim to endow LLMs with robust reasoning capabilities.

LLMModel Trainingartificial intelligence

0 likes · 12 min read

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

21CTO

Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI SafetyLlama 4Mixture of Experts

0 likes · 14 min read

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

DataFunTalk

Apr 6, 2025 · Artificial Intelligence

Meta Unveils Llama 4: New Multimodal AI Models with Mixture‑of‑Experts Architecture and 10 Million‑Token Context

Meta announced the Llama 4 series—Scout, Maverick and Behemoth—featuring multimodal capabilities, Mixture‑of‑Experts design, up to 10 million‑token context windows, and state‑of‑the‑art performance on STEM, multilingual and image benchmarks, with models now downloadable from llama.com and Hugging Face.

Llama 4Mixture of ExpertsModel Training

0 likes · 14 min read

Meta Unveils Llama 4: New Multimodal AI Models with Mixture‑of‑Experts Architecture and 10 Million‑Token Context

Baidu MEUX

Mar 27, 2025 · Artificial Intelligence

How LoRA Supercharges AI‑Generated Seasonal Poetry Posters

This article details how the LoRA model was employed to enhance AI-generated seasonal poetry posters, covering project background, innovative gameplay, training methodology, dataset preparation, and the resulting benefits of fully automated visual creation that boosts user engagement and product AI capabilities.

AI creativityAI image generationLoRA

0 likes · 8 min read

How LoRA Supercharges AI‑Generated Seasonal Poetry Posters

Cognitive Technology Team

Mar 6, 2025 · Artificial Intelligence

From Traditional Machine Learning to Deep Learning: A Comprehensive Guide to Algorithms, Feature Engineering, and Model Training

This article provides a step‑by‑step tutorial that walks readers through the fundamentals of traditional machine‑learning algorithms, feature‑engineering techniques, model training pipelines, evaluation metrics, and then advances to deep‑learning concepts such as MLPs, activation functions, transformers, and modern recommendation‑system models.

Deep LearningModel TrainingPython

0 likes · 63 min read

From Traditional Machine Learning to Deep Learning: A Comprehensive Guide to Algorithms, Feature Engineering, and Model Training

Tencent Technical Engineering

Feb 26, 2025 · Artificial Intelligence

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Thirteen engineers praise DeepSeek’s open‑source, reinforcement‑learning‑driven architecture—using FP8 storage and SFT‑free training—to deliver GPT‑4‑level reasoning at one‑twentieth the cost, enabling single‑GPU deployment, lowering barriers for academia and startups, and prompting notable market reactions that could democratize advanced AI.

AI cost reductionDeepSeekFP8

0 likes · 9 min read

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Architect

Feb 20, 2025 · Artificial Intelligence

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

The article analyses recent breakthroughs such as OpenAI's o1, Long CoT, and test‑time search, arguing that enabling LLMs to perform self‑critique and reinforcement learning with long output sequences is essential for future AI performance, while warning against overly structured workflows.

AI researchIn‑Context RLLLM

0 likes · 12 min read

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

Architect

Feb 18, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Innovations and Architecture for High‑Performance Reasoning LLMs

The article explains how DeepSeek‑R1 advances large language model reasoning by releasing a lightweight distilled version, sharing a complete training pipeline—including pre‑training, supervised fine‑tuning, and reinforcement learning—introducing long‑chain reasoning data, a transitional inference model, and a comprehensive RL optimization that together yield strong mathematical and logical capabilities.

AIDeepSeekModel Training

0 likes · 10 min read

DeepSeek‑R1: Training Innovations and Architecture for High‑Performance Reasoning LLMs

Big Data Tech Team

Feb 18, 2025 · Artificial Intelligence

How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models

This article breaks down DeepSeek's LLM training pipeline, explaining the massive pre‑training phase, instruction fine‑tuning, reinforcement‑learning‑from‑human‑feedback, and the distinct roles of its V3 instruction model and R1 reasoning model, while also highlighting performance metrics and current limitations.

DeepSeekLLMModel Training

0 likes · 8 min read

How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models

Huolala Tech

Feb 14, 2025 · Artificial Intelligence

How AI‑Driven Loss Prevention Transforms Risk Management Across the Software Lifecycle

This article explains a comprehensive AI‑powered loss‑prevention framework that automatically identifies financial‑risk scenarios in both existing and new code, integrates model‑based detection into product, development, testing, and release stages, and continuously refines coverage through intelligent monitoring and rule enforcement.

AIModel TrainingSoftware Engineering

0 likes · 11 min read

How AI‑Driven Loss Prevention Transforms Risk Management Across the Software Lifecycle

JD Tech Talk

Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekLLMModel Training

0 likes · 10 min read

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

JD Cloud Developers

Feb 13, 2025 · Artificial Intelligence

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

This article demystifies DeepSeek R1 by explaining key concepts such as online search integration and the R1 model, detailing its two‑phase training pipeline, core techniques like iterative data enhancement, and showcases practical reproductions, benchmark tests, and deployment examples for AI developers.

DeepSeekModel Trainingknowledge distillation

0 likes · 12 min read

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

Architect

Feb 12, 2025 · Artificial Intelligence

Can S‑Curve Theory Explain the Limits of Large‑Model Scaling Laws?

The article analyses how S‑shaped growth curves can model the apparent scaling laws of large language models, discusses the three phases of model development, proposes an ability‑density hypothesis, and explores future scenarios where scaling laws may plateau or shift.

AI growthAbility DensityModel Training

0 likes · 16 min read

Can S‑Curve Theory Explain the Limits of Large‑Model Scaling Laws?

DataFunSummit

Feb 10, 2025 · Artificial Intelligence

Intelligent Decision-Making Large Model ORLM: Research, Training Challenges, Commercialization, and Future Directions

This article presents the ORLM intelligent decision‑making large model, detailing how real‑world decision problems are formalized and solved, the training difficulties and data synthesis methods, the transition from academic research to commercial platforms, and future technical improvement plans.

AIDecision ModelingModel Training

0 likes · 10 min read

Intelligent Decision-Making Large Model ORLM: Research, Training Challenges, Commercialization, and Future Directions

Top Architect

Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM

0 likes · 12 min read

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

JavaEdge

Feb 6, 2025 · Artificial Intelligence

Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost

The article explains the “impossible triangle” in Transformer training, showing how speed, model performance, and computational cost cannot all be optimized simultaneously, and uses analogies and real‑world examples like GPT‑4 to illustrate the necessary trade‑offs.

Deep LearningModel TrainingPerformance Tradeoff

0 likes · 7 min read

Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost

Architect

Feb 5, 2025 · Industry Insights

What Makes DeepSeek R1 a Game-Changer? Inside the AI Industry’s Latest Power Shift

An in‑depth recap of a five‑hour Lex Fridman podcast reveals DeepSeek’s breakthrough R1 model, its cost‑saving MoE and MLA techniques, the geopolitical chip export battle, market reactions, and broader AI industry trends, offering a comprehensive analysis of technology, economics, and future implications.

AI industryDeepSeekGeopolitics

0 likes · 14 min read

What Makes DeepSeek R1 a Game-Changer? Inside the AI Industry’s Latest Power Shift

DataFunSummit

Jan 25, 2025 · Artificial Intelligence

AI-Driven Next-Generation Sales: Project Overview, Core Technologies, System Deployment, and Future Outlook

This article explores how AI transforms next‑generation sales by detailing project background and goals, core technologies such as efficient sample generation, model training and evaluation, system deployment impact, practical case studies, challenges, solutions, and future directions across multiple industries.

AIModel TrainingSales Automation

0 likes · 25 min read

AI-Driven Next-Generation Sales: Project Overview, Core Technologies, System Deployment, and Future Outlook

Kuaishou Tech

Jan 24, 2025 · Artificial Intelligence

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

KwaiCoder-23BA4-v1 is a 23B wide MoE code‑completion model that achieves state‑of‑the‑art performance on HumanEval, BigCodeBench and Fill‑in‑Middle benchmarks by using high‑quality data, a cost‑effective training pipeline that combines model pruning, knowledge distillation and fine‑grained merging, and extensive ablation studies.

AIBenchmarkCode Generation

0 likes · 10 min read

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

AI Large Model Application Practice

Jan 9, 2025 · Artificial Intelligence

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

This article walks through the complete training cycle of a simple neural network—from random weight initialization and forward propagation with labeled data, through loss calculation and gradient‑based weight updates, to iterative epochs, average loss, and practical issues like gradient explosion and vanishing.

AIModel TrainingNeural Networks

0 likes · 11 min read

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

Alibaba Cloud Big Data AI Platform

Nov 27, 2024 · Artificial Intelligence

How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart

This guide walks developers through the entire lifecycle of Qwen2.5‑Coder—covering model sizes, training token expansion, resource requirements, fine‑tuning with SFT/DPO, evaluation on custom and public datasets, and one‑click deployment and compression on Alibaba Cloud's PAI‑QuickStart platform.

Code GenerationDeploymentLLM

0 likes · 15 min read

How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart

Test Development Learning Exchange

Nov 26, 2024 · Artificial Intelligence

Comprehensive Python Tutorial for Data Preprocessing, Feature Engineering, Model Training, Evaluation, and Deployment

This tutorial walks through consolidating the first ten days of learning by covering data preprocessing, feature engineering, model training with linear regression, decision tree, and random forest, model evaluation using cross‑validation, and finally saving and loading the best model, all illustrated with complete Python code examples.

Model TrainingPythondata preprocessing

0 likes · 9 min read

Comprehensive Python Tutorial for Data Preprocessing, Feature Engineering, Model Training, Evaluation, and Deployment

DataFunTalk

Nov 25, 2024 · Artificial Intelligence

2024 AI Development Report Summary by Fei‑Fei Li’s Team

The 2024 AI Development Report by Fei‑Fei Li’s team highlights rapid progress in model capabilities, rising training costs, dominant contributions from the US, China and Europe, emerging reliability challenges, and the broad economic, medical, and educational impacts of artificial intelligence.

2024AIEconomic Impact

0 likes · 12 min read

2024 AI Development Report Summary by Fei‑Fei Li’s Team

Alibaba Cloud Developer

Nov 22, 2024 · Artificial Intelligence

Master YOLOv8: End-to-End Guide to Object Detection, Training, and Deployment

This comprehensive tutorial walks you through YOLOv8 object detection—from environment setup and dataset preparation to model training, validation, testing, and conversion to ONNX and TensorRT—providing clear commands, code snippets, and visual results for each step.

Model TrainingONNXTensorRT

0 likes · 8 min read

Master YOLOv8: End-to-End Guide to Object Detection, Training, and Deployment

Baobao Algorithm Notes

Nov 13, 2024 · Artificial Intelligence

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Cleaning SFT data for LLMs is surprisingly complex, as subtle JSON formatting variations, inconsistent markdown wrappers, intent settings, and unit handling can cause model inconsistencies, requiring unified standards, careful prompt design, and extensive manual review to ensure reliable training outputs.

JSON formattingLLM data cleaningModel Training

0 likes · 8 min read

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Architecture and Beyond

Nov 2, 2024 · Artificial Intelligence

Step-by-Step Guide to Training a LoRA Model with Flux1_dev on ComfyUI

This tutorial walks programmers through preparing a GPU cloud environment, installing ComfyUI, downloading Flux1_dev models, integrating a custom LoRA, labeling generated images, and finally training the LoRA using ai‑toolkit, providing detailed commands, configuration tips, and practical cost estimates.

AI image generationComfyUIFlux

0 likes · 12 min read

Step-by-Step Guide to Training a LoRA Model with Flux1_dev on ComfyUI

Test Development Learning Exchange

Oct 29, 2024 · Artificial Intelligence

Data Preprocessing and Modeling with Pandas and Scikit‑learn

This guide walks through using Pandas for data cleaning, feature engineering, and preparation, then demonstrates building, evaluating, and persisting a machine‑learning model with Scikit‑learn's pipeline and RandomForestClassifier in Python.

Model Trainingdata preprocessingmachine learning

0 likes · 5 min read

Data Preprocessing and Modeling with Pandas and Scikit‑learn

DataFunSummit

Aug 12, 2024 · Artificial Intelligence

Design and Application of Xiaohongshu Heterogeneous Training and Inference Engine

This article presents a comprehensive overview of Xiaohongshu's heterogeneous training and inference engine, covering the challenges of model engineering, the design of elastic heterogeneous engines, future HPC training frameworks, AI compilation techniques, and a forward‑looking outlook on scalability and performance.

AIAI CompilationHPC

0 likes · 19 min read

Design and Application of Xiaohongshu Heterogeneous Training and Inference Engine

Python Programming Learning Circle

Aug 8, 2024 · Operations

Automating Python Notifications for Model Training, Data Transfer, and Financial Modeling via Email

This article explains how to use Python scripts, the email and smtplib libraries, and MIME components to automatically send progress and completion notifications for long‑running tasks such as model training, data uploads, and financial simulations, including code examples and configuration details.

Model TrainingNotificationPython

0 likes · 13 min read

Automating Python Notifications for Model Training, Data Transfer, and Financial Modeling via Email

Java Tech Enthusiast

Aug 1, 2024 · Artificial Intelligence

Apple Intelligence: Inside the New Apple Foundation Model

Apple Intelligence, an on‑device AI suite debuting with iOS 18.1 beta, centers on the Apple Foundation Model—a 3‑billion‑parameter on‑device LLM (and a larger undisclosed cloud version) trained on TPUs with novel RL algorithms and mixed‑precision quantization, delivering Siri, writing assistance, photo search, and benchmark performance that surpasses GPT‑4, though currently limited to paid developers.

AIApple IntelligenceModel Training

0 likes · 11 min read

Apple Intelligence: Inside the New Apple Foundation Model

Alibaba Cloud Developer

Jun 25, 2024 · Artificial Intelligence

Demystifying Large Language Models: From ChatGPT Basics to Future Impact

This article walks readers through the fundamentals of large language models—explaining ChatGPT's architecture, training pipelines, required GPU hardware, industry deployment models, societal implications, and future industry trends—offering a cohesive framework for both newcomers and professionals.

AI ImpactAI fundamentalsGPU computing

0 likes · 22 min read

Demystifying Large Language Models: From ChatGPT Basics to Future Impact

Practical DevOps Architecture

May 30, 2024 · Artificial Intelligence

Eight‑Week LLM and Large Model Training Course Outline

This article outlines an eight‑week curriculum covering LLM evolution, PyTorch fundamentals, CUDA training, large‑model fine‑tuning, LangChain application development, cloud‑based quantization, industry case studies, and a recruitment session, providing video resources for each topic.

AIFine-tuningLLM

0 likes · 5 min read

Eight‑Week LLM and Large Model Training Course Outline

DataFunTalk

May 21, 2024 · Big Data

Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights

This article details how Alluxio was adopted to replace NAS in autonomous driving model training, describing the data closed‑loop workflow, the challenges of the previous system, Alluxio's architectural benefits, deployment strategies across single and multiple data centers, functional and performance testing, operational tuning, and the resulting cost and efficiency gains.

AlluxioModel TrainingPerformance Optimization

0 likes · 15 min read

Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights

DataFunTalk

May 18, 2024 · Artificial Intelligence

Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions

This article details the background, goals, and evolution of Tencent's FinTech AI development platform, outlines the technical challenges faced in feature engineering, model training, and inference services, and presents the comprehensive solutions and future plans implemented to improve efficiency, stability, and scalability.

Cloud NativeFinTechInference

0 likes · 13 min read

Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions

Baidu Tech Salon

May 15, 2024 · Artificial Intelligence

Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM

Baidu Baige’s AIAK‑LLM suite accelerates large‑model training and inference by boosting Model FLOPS Utilization through techniques such as TP communication overlap, hybrid recompute, zero‑offload, automatic parallel‑strategy search, multi‑chip support, and inference‑specific optimizations, achieving over 60 % speedup and seamless Hugging Face integration.

AI InfrastructureAIAK-LLMBaidu Baige

0 likes · 26 min read

Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM

DataFunTalk

May 10, 2024 · Artificial Intelligence

GPU Performance Optimization Practices for Tencent PCG Recommendation Model Training Framework

This article presents a comprehensive overview of Tencent PCG's GPU‑based recommendation model training framework, detailing why GPU adoption is essential, the hardware and software challenges faced, the multi‑level data architecture, pipeline design, and a series of network, storage, and compute optimizations, followed by future directions.

Distributed TrainingGPUModel Training

0 likes · 13 min read

GPU Performance Optimization Practices for Tencent PCG Recommendation Model Training Framework

Architects' Tech Alliance

Apr 25, 2024 · Industry Insights

What China’s AI Labs Learned from Scaling Domestic Large‑Model Training

The article analyzes the computational characteristics and system challenges of training large AI models on domestic platforms, examines framework parallelism and future algorithms, and proposes six strategic measures—including scaling compute, improving data management, building a national R&D team, and boosting AI‑chip investment—to accelerate China’s AI leadership.

AI InfrastructureModel Trainingdomestic AI

0 likes · 5 min read

What China’s AI Labs Learned from Scaling Domestic Large‑Model Training

DaTaobao Tech

Apr 19, 2024 · Artificial Intelligence

AI‑Driven Aesthetic Evaluation for E‑commerce Image Generation

The article outlines a systematic method for defining, training, and deploying AI‑driven aesthetic standards to evaluate and improve e‑commerce image generation on Taobao, detailing a four‑step workflow, multimodal model architecture, scoring criteria, validation processes, and future plans for style libraries and an AI‑PaaS offering.

AIAesthetic EvaluationModel Training

0 likes · 10 min read

AI‑Driven Aesthetic Evaluation for E‑commerce Image Generation

Top Architect

Apr 18, 2024 · Artificial Intelligence

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

This article provides a comprehensive overview of Transformer models, covering their attention-based architecture, encoder-decoder structure, training procedures including teacher forcing, inference workflow, advantages over RNNs, and various applications in natural language processing such as translation, summarization, and classification.

Attention MechanismDeep LearningInference

0 likes · 11 min read

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

Data Thinking Notes

Apr 11, 2024 · Artificial Intelligence

How Financial Institutions Are Building Their Own Large Language Models

This article explores how the finance sector is creating specialized large language models—covering the shift from generic to domain‑specific models, training innovations, evaluation methods, and real‑world applications such as marketing, customer service, risk control, and operational analytics.

ApplicationsModel Trainingfinance AI

0 likes · 16 min read

How Financial Institutions Are Building Their Own Large Language Models

Sohu Tech Products

Mar 27, 2024 · Artificial Intelligence

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

NVIDIA’s comprehensive LLM ecosystem combines the full‑stack NeMo Framework for data curation, distributed training, fine‑tuning, inference acceleration with TensorRT‑LLM and Triton, plus Retrieval‑Augmented Generation and Guardrails, enabling efficient, low‑latency, knowledge‑grounded model deployment across clusters.

AI accelerationModel TrainingNeMo Framework

0 likes · 16 min read

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

Architect

Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

Attention MechanismInferenceModel Training

0 likes · 14 min read

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

DataFunSummit

Mar 13, 2024 · Artificial Intelligence

Overview of Vivo BlueLM: Evolution, Training Challenges, Deployment, and Product Applications

This article presents a comprehensive overview of Vivo's BlueLM large language model, covering its historical evolution, training pipeline and data challenges, algorithmic innovations, safety measures, edge‑device optimization, product deployments such as BlueLM Mini‑V and BlueQianXun, and insights from a detailed Q&A session.

AI productEdge ComputingModel Training

0 likes · 17 min read

Overview of Vivo BlueLM: Evolution, Training Challenges, Deployment, and Product Applications

DataFunSummit

Jan 14, 2024 · Artificial Intelligence

Large Language Model Innovations for the Financial Industry: From General to Finance‑Specific Models, Training Techniques, Evaluation Methods, and Real‑World Applications

This article details how the financial sector is adopting large language models, describing the shift from generic to finance‑specific models, the technical challenges and cost considerations, the XuanYuan model releases, novel training and evaluation approaches, and a range of practical applications such as marketing, service, operations, office assistance, and risk control.

AIApplicationsModel Training

0 likes · 17 min read

Large Language Model Innovations for the Financial Industry: From General to Finance‑Specific Models, Training Techniques, Evaluation Methods, and Real‑World Applications

DataFunSummit

Jan 13, 2024 · Artificial Intelligence

Large Model Applications in Automotive Industrialization: Practices, Architecture, and Case Studies

This presentation explores the development of ChatGPT, the underlying principles of large language models, their role in enabling new industrialization, detailed NIO automotive AI platform architecture, data‑model‑agent closed‑loops, intelligent inspection solutions, and practical case studies such as G8D Agents, providing a comprehensive view of large‑model deployment in the automotive sector.

AI agentsIndustrial AIModel Training

0 likes · 13 min read

Large Model Applications in Automotive Industrialization: Practices, Architecture, and Case Studies

Taobao Design

Dec 14, 2023 · Artificial Intelligence

How to Train a Multi‑Stage AI Model for a Brand Mascot – Tmall Case Study

This article explores the challenges of using AI image generators for brand IP, compares Midjourney and Stable Diffusion results, and presents a step‑by‑step multi‑layer model training workflow—including dataset creation, training optimization, and practical tips—to achieve a more expressive and consistent Tmall mascot.

AIMidjourneyModel Training

0 likes · 8 min read

DataFunSummit

Nov 11, 2023 · Artificial Intelligence

RWKV: Next‑Generation Heterogeneous Large Model – Design, Evolution, Performance, and Training Strategies

This article presents a comprehensive overview of the RWKV large language model, covering its origin, attention‑free RNN architecture, performance benchmarks, evolution through v4 and v5, training pipelines, diverse application cases, open‑source ecosystem, and a detailed Q&A session.

AIModel TrainingRNN

0 likes · 18 min read

RWKV: Next‑Generation Heterogeneous Large Model – Design, Evolution, Performance, and Training Strategies

Alibaba Cloud Big Data AI Platform

Nov 8, 2023 · Big Data

How Big Data and AI Converge: Insights from Alibaba Cloud’s 2023 Conference

The talk outlines the evolution from model‑centric to data‑centric AI development, explains Alibaba Cloud’s integrated big data‑AI platform, showcases real‑world use cases like knowledge‑base QA and personalized recommendation, and details the underlying cloud‑native services that enable seamless data and AI collaboration.

AI EngineeringModel Training

0 likes · 16 min read

How Big Data and AI Converge: Insights from Alibaba Cloud’s 2023 Conference

Ximalaya Technology Team

Oct 9, 2023 · Artificial Intelligence

DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud

Ximalaya AI Cloud leverages DeepRec’s Embedding Variable to elastically manage high‑dimensional sparse features with low collision, supporting admission/eviction, multi‑level storage and minute‑level incremental model updates, which together boost GPU utilization, halve training time and improve recommendation CTR by 2‑3 % while maintaining latency.

AI cloudDeepRecKubernetes

0 likes · 13 min read

DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud

Tencent Tech

Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI SafetyData QualityKnowledge Retrieval

0 likes · 8 min read

Why Do Large Language Models Hallucinate and How to Reduce It?

Alibaba Cloud Infrastructure

Sep 13, 2023 · Artificial Intelligence

Pai‑Megatron‑Patch: Design Principles, Key Features, and End‑to‑End Usage for Large Language Model Training

This article introduces the open‑source Pai‑Megatron‑Patch tool from Alibaba Cloud, explains its non‑intrusive patch architecture, enumerates supported models and features such as weight conversion, Flash‑Attention 2.0, FP8 training with Transformer Engine, and provides detailed command‑line examples for model conversion, pre‑training, supervised fine‑tuning, inference, and RLHF reinforcement learning pipelines.

Deep LearningFP8LLM

0 likes · 19 min read

Pai‑Megatron‑Patch: Design Principles, Key Features, and End‑to‑End Usage for Large Language Model Training

37 Interactive Technology Team

Aug 23, 2023 · Artificial Intelligence

LoRA Model Training Guide for Stable Diffusion: Comparison, Workflow, and Tips

This guide compares Stable Diffusion fine‑tuning methods, shows why LoRA offers the best size‑and‑speed trade‑off, and provides a step‑by‑step workflow—from dataset collection and preprocessing to parameter tuning, 20‑minute training on a single GPU, and practical tips for successful custom model generation.

AI artDreamBoothLoRA

0 likes · 9 min read

LoRA Model Training Guide for Stable Diffusion: Comparison, Workflow, and Tips

Rare Earth Juejin Tech Community

Aug 17, 2023 · Artificial Intelligence

Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training

This article introduces the YOLOv8 object‑detection framework on the Ultralytics platform, covering environment setup, command‑line and Python APIs for inference, model‑file options, result interpretation, data annotation, training procedures, and exporting models to various deployment formats.

Computer VisionModel TrainingPython

0 likes · 14 min read

Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training

Tencent Cloud Developer

Jul 19, 2023 · Artificial Intelligence

Build a Full‑Scale LLM from Scratch in 61 Lines of Python

This step‑by‑step tutorial shows how to set up a GPU environment, prepare custom text data, train a tokenizer, configure and train a GPT‑2‑based large language model, test its generation, and run the entire pipeline using only 61 lines of Python code.

AIDockerGPT-2

0 likes · 10 min read

Build a Full‑Scale LLM from Scratch in 61 Lines of Python

政采云技术

Jun 28, 2023 · Artificial Intelligence

An Overview of ChatGPT: Architecture, Training Process, Advantages, Risks, and Practical Team Deployment

This article explains what GPT is, how it is trained, its strengths and limitations, the various risks it poses, and provides practical guidance on safely adopting large language models like ChatGPT within development teams, including code‑level analysis examples.

AI risksChatGPTModel Training

0 likes · 13 min read

An Overview of ChatGPT: Architecture, Training Process, Advantages, Risks, and Practical Team Deployment

IT Services Circle

Jun 19, 2023 · Artificial Intelligence

AI Pollution: How Generated Content Threatens the Internet and Model Training

The article examines how AI-generated misinformation spreads across platforms—from misleading answers on Bing and Stack Overflow to fabricated news stories—highlighting the resulting contamination of online information, the risks to model training, and emerging efforts to detect and curb such low‑quality AI output.

AIChatGPTContent Pollution

0 likes · 8 min read

AI Pollution: How Generated Content Threatens the Internet and Model Training

Tencent Cloud Developer

Apr 20, 2023 · Artificial Intelligence

Master Stable Diffusion: From Hardware Setup to Advanced Prompt Engineering

This comprehensive guide walks you through the hardware requirements, environment deployment, key parameters, prompt techniques, ControlNet integration, model download and installation, as well as style and character training for Stable Diffusion, providing practical code snippets and visual examples for each step.

AI image generationControlNetGPU deployment

0 likes · 38 min read

Master Stable Diffusion: From Hardware Setup to Advanced Prompt Engineering

Python Programming Learning Circle

Mar 21, 2023 · Artificial Intelligence

Why Replicating ChatGPT in China Demands Massive AI Infrastructure and Cloud Computing

The article explains that reproducing ChatGPT in China is not just a matter of funding but requires extensive expertise in large‑scale language model training, massive compute resources, optimized cloud infrastructure, and deep AI research, as demonstrated by Alibaba's DAMO Academy efforts.

AI InfrastructureChatGPTModel Training

0 likes · 10 min read

Why Replicating ChatGPT in China Demands Massive AI Infrastructure and Cloud Computing

360 Tech Engineering

Mar 17, 2023 · Artificial Intelligence

Understanding ChatGPT: OpenAI’s Development, Model Evolution, and Training Techniques

This article provides an overview of ChatGPT’s rapid rise, OpenAI’s founding, the evolution of GPT models up to GPT‑3, the data‑driven training process, model capabilities and limitations, and practical guidance for users, highlighting the interplay between open‑source research and commercial deployment.

ChatGPTGPT-3Model Training

0 likes · 14 min read

Understanding ChatGPT: OpenAI’s Development, Model Evolution, and Training Techniques

DataFunTalk

Feb 15, 2023 · Big Data

Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training

This article summarizes how Ant Group introduced Alluxio to address storage I/O, capacity, and latency challenges in large‑scale model training, detailing stability improvements through worker‑register follower and master migration, performance gains via follower‑only reads, and horizontal scaling using metadata sharding and multi‑cluster deployment.

AlluxioBig DataModel Training

0 likes · 15 min read

Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training

Laravel Tech Community

Feb 9, 2023 · Artificial Intelligence

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

This article explains how ChatGPT builds on GPT‑3, describes the supervised‑plus‑reinforcement learning (RLHF) pipeline that fine‑tunes the model, compares model capability with consistency, and discusses the performance evaluation and remaining limitations of large language models.

AlignmentChatGPTModel Training

0 likes · 15 min read

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

Baidu Intelligent Cloud Tech Hub

Jan 11, 2023 · Artificial Intelligence

How Baidu Cloud Powers End-to-End Autonomous Driving Data Ops and AI

This article outlines Baidu Intelligent Cloud's comprehensive, low‑cost solution for autonomous‑driving data pipelines—from road data collection and compliance, through annotation, management, and model training, to simulation—highlighting the platform's tools, services, and security measures that accelerate development.

AIData ManagementModel Training

0 likes · 18 min read

How Baidu Cloud Powers End-to-End Autonomous Driving Data Ops and AI

Alibaba Cloud Big Data AI Platform

Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoEModel Training

0 likes · 26 min read

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

Baidu Intelligent Cloud Tech Hub

Jan 5, 2023 · Artificial Intelligence

How Baidu’s AI IaaS Supercharges Autonomous Driving: 5× Data Speed & 391% Model Gains

The talk outlines Baidu’s Baige AI IaaS solution for autonomous driving, detailing a low‑cost, high‑efficiency cloud stack that accelerates data access fivefold, boosts model training speed up to 391 %, cuts inference latency by 90 %, reduces simulation costs by 60 %, and explains the underlying storage, compute, container and GPU virtualization technologies.

AI IaaSModel Trainingautonomous driving

0 likes · 17 min read

How Baidu’s AI IaaS Supercharges Autonomous Driving: 5× Data Speed & 391% Model Gains

Laiye Technology Team

Dec 16, 2022 · Artificial Intelligence

Efficient Production of Scene-specific OCR Models Using an AI Platform

This article explains how a unified AI platform enables rapid, data‑driven creation, training, deployment, and evaluation of OCR models for visually distinct text regions such as seals, meter readings, license plates, and VIN codes, while minimizing hardware and annotation costs.

AI PlatformComputer VisionKubeflow

0 likes · 7 min read

Efficient Production of Scene-specific OCR Models Using an AI Platform

ByteDance Terminal Technology

Sep 1, 2022 · Artificial Intelligence

Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion

This article presents a hybrid pipeline combining traditional computer vision techniques and deep learning models to automatically extract and verify text background colors in UI automation screenshots, effectively addressing challenges like limited training data and complex borders to significantly reduce manual inspection costs while achieving high accuracy and robustness in production environments.

Automated TestingComputer VisionDeep Learning

0 likes · 10 min read

Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion

NetEase Yanxuan Technology Product Team

Aug 29, 2022 · Artificial Intelligence

Building Yanxuan Machine Learning Platform: Architecture and Implementation

Yanxuan built a Kubeflow‑based machine‑learning platform that unifies data preprocessing, feature engineering, model training, validation, and deployment, using Smart‑jobs, Smart‑Infer, Smart‑backend, Airflow pipelines, Jupyter notebooks, and Istio‑enhanced inference services to boost algorithm engineers’ efficiency and integrate with Kubernetes, HDFS, and Hive.

Airflow orchestrationAlgorithm DevelopmentInference Service

0 likes · 14 min read

Building Yanxuan Machine Learning Platform: Architecture and Implementation

Python Programming Learning Circle

Mar 31, 2022 · Artificial Intelligence

Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices

This article provides a thorough collection of commonly used PyTorch code snippets covering environment setup, reproducibility, GPU configuration, tensor manipulation, model building, data preprocessing, training and evaluation loops, custom loss functions, regularization techniques, learning‑rate scheduling, checkpointing, and practical tips for efficient deep‑learning development.

Deep LearningGPUModel Training

0 likes · 37 min read

Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices