Tagged articles
130 articles
Page 1 of 2
Geek Labs
Geek Labs
May 11, 2026 · Artificial Intelligence

Train a 64M LLM from Scratch in 2 Hours for $3 and Master LLM Systems

This article introduces two open‑source projects—MiniMind, which lets you train a 64M‑parameter LLM in about two hours for under $3, and Happy‑LLM, a systematic tutorial that explains LLM theory and practice—detailing their features, training pipelines, benchmarks, data, and how they complement each other for comprehensive LLM learning.

AIBenchmarkHappy-LLM
0 likes · 7 min read
Train a 64M LLM from Scratch in 2 Hours for $3 and Master LLM Systems
Machine Heart
Machine Heart
Apr 30, 2026 · Artificial Intelligence

Beyond DeepSeek V4: A Trillion‑Parameter LLM Trained End‑to‑End on Domestic Chips

The article analyzes how both DeepSeek V4 and Meituan's LongCat‑2.0‑P preview, each with trillion‑scale parameters and 1 M‑token context, were trained and inferred entirely on Chinese‑made accelerators, detailing memory optimizations, deterministic operators, MoE redesigns, and massive multi‑card clusters that prove domestic compute can meet top‑tier AI workloads.

Deterministic OpsDomestic AI ChipLongCat
0 likes · 13 min read
Beyond DeepSeek V4: A Trillion‑Parameter LLM Trained End‑to‑End on Domestic Chips
PMTalk Product Manager Community
PMTalk Product Manager Community
Apr 30, 2026 · Artificial Intelligence

How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager

The article walks through model training, validation, ensemble learning, and deployment from an AI product manager’s viewpoint, using a churn‑prediction case to illustrate decision boundaries, metric choices, industry‑specific algorithm trade‑offs, cost considerations, and practical serving options.

AI product managementLarge ModelModel Deployment
0 likes · 6 min read
How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager
Machine Heart
Machine Heart
Apr 18, 2026 · Artificial Intelligence

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Paxini, together with JD Cloud, Tencent Cloud, and Baidu Intelligent Cloud, launches the world’s first hundred‑billion‑scale, full‑modal, high‑degree‑of‑freedom embodied AI data cloud mall, offering instant online data procurement, end‑to‑end model training pipelines, and validated performance gains in both lab and real‑world robot tasks.

Embodied AIModel TrainingMultimodal Data
0 likes · 13 min read
Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Apr 11, 2026 · Artificial Intelligence

How to Build a Full‑Cycle Model Engineering System for Scalable AI

This article outlines a comprehensive, six‑part model engineering framework that transforms AI capabilities into reusable business functions, defines a stable technical stack, establishes model selection and architecture guidelines, implements rigorous control, data, and training processes, and explains how these layers synergize for reliable, scalable deployment.

AI deploymentModel TrainingOperations
0 likes · 27 min read
How to Build a Full‑Cycle Model Engineering System for Scalable AI
AI Tech Publishing
AI Tech Publishing
Apr 8, 2026 · Artificial Intelligence

How Model, Harness, and Memory Enable Continual Learning for AI Agents

The article breaks down AI agent continual learning into three layers—model, harness, and context—explains their distinct challenges, shows how traces link them, and argues that focusing on harness and context yields faster, more practical improvements than merely retraining models.

AI agentsModel Trainingcontext memory
0 likes · 9 min read
How Model, Harness, and Memory Enable Continual Learning for AI Agents
PaperAgent
PaperAgent
Mar 29, 2026 · Industry Insights

From Reasoning to Agentic Thinking: How Harnesses Are Redefining AI Development

The article examines the shift from traditional reasoning‑based large‑language‑model pipelines to agentic, harness‑driven AI systems, outlining the definition of a harness, its engineering challenges, architectural components, and the broader implications for training, reinforcement learning, and future research directions.

AI HarnessInfrastructureIntelligent agents
0 likes · 16 min read
From Reasoning to Agentic Thinking: How Harnesses Are Redefining AI Development
AI Explorer
AI Explorer
Mar 23, 2026 · Artificial Intelligence

How Unsloth Studio Turns Local AI Training into a Simple, High‑Performance Experience

Unsloth Studio, an open‑source local AI studio, combines a sleek web UI with a custom Triton kernel that claims up to 2× faster training, 70% VRAM savings (80% for RL), supports over 500 models, visual data‑recipe workflows, and both desktop and Python library usage for developers, researchers, and hobbyists.

AI StudioLocal AIModel Training
0 likes · 7 min read
How Unsloth Studio Turns Local AI Training into a Simple, High‑Performance Experience
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 22, 2026 · Artificial Intelligence

Hands‑On Review: Unsloth Studio’s One‑Stop Local LLM Console (Windows‑Ready)

The author tests Unsloth Studio, a local web UI that unifies model download, execution, dataset handling, training, fine‑tuning and export, supporting GGUF and safetensors formats across Windows, macOS and Linux, and highlights its integrated tool‑calling, data‑recipe workflow, observability features, installation quirks, and target user scenarios.

GGUFLocal-LLMModel Training
0 likes · 9 min read
Hands‑On Review: Unsloth Studio’s One‑Stop Local LLM Console (Windows‑Ready)
AI Info Trend
AI Info Trend
Mar 19, 2026 · Industry Insights

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

In February 2026, China introduced a pioneering group standard that defines executable acceptance rules for AI training datasets, linking data delivery, quality assessment, and model training through a three‑layer framework, quantitative metrics, and a pre‑negotiated quality baseline to reduce disputes and costs.

AIData AcceptanceData Quality
0 likes · 7 min read
How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance
Code Mala Tang
Code Mala Tang
Mar 5, 2026 · Artificial Intelligence

Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models

This tutorial walks readers through the fundamentals of YOLOv12, covering model variants, dataset preparation with Roboflow, optional FlashAttention acceleration, installation, model selection, training commands, post‑training tasks such as tracking, validation, inference, exporting to ONNX, and benchmarking, all with concrete code snippets and practical tips.

Computer VisionFlashAttentionModel Training
0 likes · 8 min read
Master YOLOv12: A Step‑by‑Step Guide to Build, Train, and Deploy Custom Models
Wuming AI
Wuming AI
Mar 2, 2026 · Industry Insights

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

The article explains how the newly released "AI Training Data Set Delivery and Quality Acceptance Specification" addresses gaps in existing data‑quality standards by defining a three‑layer acceptance framework, quantitative metrics, and a pre‑negotiated quality‑baseline mechanism to make dataset delivery verifiable and directly supportive of model training goals.

AI data standardsData GovernanceData Quality
0 likes · 7 min read
How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 23, 2026 · Artificial Intelligence

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

This article systematically dissects the engineering decisions behind frontier large‑language‑model training—covering architecture choices, attention variants, optimizer evolution, data‑curation strategies, scaling‑law insights, and post‑training SFT/RL pipelines—based on open‑source reports from seven leading AI laboratories.

Mixture of ExpertsModel Traininglarge language models
0 likes · 26 min read
System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs
AI Cyberspace
AI Cyberspace
Jan 18, 2026 · Artificial Intelligence

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

The article explains various learning paradigms (supervised, unsupervised, self‑supervised, semi‑supervised, and reinforcement), describes dataset types and quality considerations, outlines preprocessing steps like filtering, deduplication, and tokenization, and discusses scaling laws linking model size, data volume, and compute resources, with concrete examples and code.

Model Trainingdata preprocessingmachine learning
0 likes · 26 min read
Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training
Data Party THU
Data Party THU
Dec 20, 2025 · Artificial Intelligence

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

This guide walks you through 20 fundamental PyTorch concepts—including tensor creation, operations, autograd, model building, data loading, GPU acceleration, and best‑practice tricks—providing clear code snippets and step‑by‑step explanations so you can quickly prototype, train, and deploy neural networks.

Deep LearningGPU AccelerationModel Training
0 likes · 16 min read
Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment
Alimama Tech
Alimama Tech
Oct 22, 2025 · Artificial Intelligence

How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On

This article details Alibaba’s Taobao Star fashion AIGC model, explaining its data pipeline, captioning strategy, multi‑stage training, and impressive virtual try‑on results for users and merchants, while showcasing model‑based and model‑free generation and pose‑transfer capabilities.

AIAIGCComputer Vision
0 likes · 11 min read
How Alibaba’s AIGC Model Revolutionizes Virtual Fashion Try‑On
DataFunSummit
DataFunSummit
Oct 5, 2025 · Artificial Intelligence

How Baidu’s AI‑Powered Code Assistant Is Revolutionizing Software Development

In this detailed presentation, Baidu’s engineering manager Yang Jingwei explains the current landscape, emerging trends, key challenges, data pipelines, model training, prompt engineering, multi‑platform support, and future outlook of Baidu’s intelligent code assistant and AI IDE, illustrating practical solutions and real‑world impact.

AI code assistantModel TrainingPrompt engineering
0 likes · 26 min read
How Baidu’s AI‑Powered Code Assistant Is Revolutionizing Software Development
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 28, 2025 · Artificial Intelligence

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

This guide breaks down the complex terminology of large language models—explaining tokens, transformers, self‑attention, RAG, scaling laws, dense vs. sparse architectures, and training stages—using clear analogies and step‑by‑step explanations so readers can confidently understand and work with modern AI systems.

AI fundamentalsModel TrainingRAG
0 likes · 35 min read
Demystifying AI Jargon: A Beginner’s Guide to Large Language Models
Architecture & Thinking
Architecture & Thinking
Sep 17, 2025 · Artificial Intelligence

How the 32B ‘Zhiyu’ Model is Revolutionizing Intelligent Operations

The Zhiyu model, a 32‑billion‑parameter SRE‑focused LLM, combines extensive domain knowledge, enhanced professional skills, and deterministic RAG to deliver precise, actionable insights for intelligent operations, backed by a robust multi‑source training pipeline, staged training, and flexible deployment options.

AI OperationsModel TrainingRAG
0 likes · 7 min read
How the 32B ‘Zhiyu’ Model is Revolutionizing Intelligent Operations
AI Algorithm Path
AI Algorithm Path
Sep 8, 2025 · Artificial Intelligence

Understanding MolmoAct: The Next‑Generation Large Action Model for Robotics

This article analyzes the MolmoAct large action model, detailing its three‑stage perception‑planning‑control architecture, novel depth‑aware tokenization, extensive pre‑training and fine‑tuning pipelines, and benchmark results that demonstrate superior efficiency and generalization over prior vision‑language‑action systems.

Model TrainingMolmoActRobotics
0 likes · 12 min read
Understanding MolmoAct: The Next‑Generation Large Action Model for Robotics
DataFunSummit
DataFunSummit
Aug 28, 2025 · Artificial Intelligence

Why Finance Needs Its Own Large Language Model: Insights from Du Xiaoman

This article explains how the unique data‑driven, knowledge‑intensive, and complex nature of the financial industry makes large language models especially valuable, outlines the limitations of generic models, and shows how domain‑specific, cost‑effective models can deliver superior performance for finance.

AIModel Trainingcost efficiency
0 likes · 5 min read
Why Finance Needs Its Own Large Language Model: Insights from Du Xiaoman
Zhuanzhuan Tech
Zhuanzhuan Tech
Aug 15, 2025 · Artificial Intelligence

How AI-Powered Minos Transforms Customer Service Quality Inspection

Facing massive daily customer service data, ZhiZhi built the AI-driven Minos quality inspection system, combining inspection items, plans, and tasks with large models, regex and programmatic checks, achieving a 26‑fold detection boost and processing over 20,000 interactions per day.

AI Quality InspectionCustomer Service AutomationModel Training
0 likes · 10 min read
How AI-Powered Minos Transforms Customer Service Quality Inspection
Bilibili Tech
Bilibili Tech
Aug 12, 2025 · Artificial Intelligence

How Bilibili Scaled AI Model Training with Alluxio Cache Acceleration

This article details Bilibili's multi-layer storage architecture and Alluxio‑based cache acceleration for large‑scale AI model training, covering challenges of high‑throughput, low‑latency file access, metadata scalability, fault tolerance, and the engineering solutions that boosted I/O performance up to ten‑fold.

AIAlluxioModel Training
0 likes · 24 min read
How Bilibili Scaled AI Model Training with Alluxio Cache Acceleration
AntTech
AntTech
Aug 1, 2025 · Artificial Intelligence

How Ant Group Dominated the 2025 DCASE Audio Question Answering Challenge

The article details the 2025 DCASE Audio Question Answering (AQA) track, outlines its technical challenges, describes Ant Group's three‑stage data, model, and training pipeline, presents performance gains of their Qwen2‑Audio‑R1‑8B and Kimi‑Audio‑SFT‑12B models, and outlines future research directions.

Audio Question AnsweringDCASEModel Training
0 likes · 8 min read
How Ant Group Dominated the 2025 DCASE Audio Question Answering Challenge
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 24, 2025 · Artificial Intelligence

Optimizing Small Perception Models on Different Compute Cards for Autonomous Driving

This article shares practical experience training perception‑detection mini‑models on two different compute cards, covering environment setup, technical architecture, common dependency issues, performance‑boosting tricks such as CPU process pools, torch dataloader tuning, NCCL P2P handling, and CPFS storage optimization.

Distributed TrainingModel TrainingPerformance Optimization
0 likes · 17 min read
Optimizing Small Perception Models on Different Compute Cards for Autonomous Driving
Tencent Technical Engineering
Tencent Technical Engineering
Jul 18, 2025 · Artificial Intelligence

From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure

This article explores the evolution of AI infrastructure, comparing it with traditional backend systems, and details how hardware shifts to GPU-centric designs, software adaptations like deep learning frameworks, and engineering challenges in model training and inference can be addressed using established backend methodologies.

AI InfrastructureDeep LearningGPU computing
0 likes · 19 min read
From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure
Tencent Cloud Developer
Tencent Cloud Developer
Jul 17, 2025 · Artificial Intelligence

Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges

This article explores how AI infrastructure has shifted from CPU‑centric designs to GPU‑driven architectures, detailing hardware evolution, software changes, and the engineering challenges of large‑model training and inference, while offering practical insights for traditional backend engineers transitioning to AI systems.

AI InfrastructureDeep LearningGPU computing
0 likes · 16 min read
Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges
ELab Team
ELab Team
Jul 9, 2025 · Artificial Intelligence

How Fast‑Apply AI Models Revolutionize Code Editing with Speculative Decoding

This article explains the design of the edit_file tool, the fast‑apply model that rewrites whole files instead of diffs, its training and evaluation methodology, speculative decoding speed gains, and future research directions for large‑scale code‑editing AI systems.

AIModel Trainingcode editing
0 likes · 14 min read
How Fast‑Apply AI Models Revolutionize Code Editing with Speculative Decoding
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 30, 2025 · Artificial Intelligence

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

This article introduces a variable‑length chain‑of‑thought distillation technique built on Alibaba Cloud PAI’s EasyDistill toolkit, presents the high‑quality OmniThought‑0528 dataset, details the training of the DistillQwen‑ThoughtY 4B/8B/32B models, and provides code and usage examples for researchers and practitioners.

DatasetDistillationLLM
0 likes · 15 min read
Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY
DataFunTalk
DataFunTalk
Jun 3, 2025 · Artificial Intelligence

Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning

Researchers from NUS, Tsinghua and Salesforce AI Research introduce a meta‑capability alignment framework that integrates deductive, inductive and abductive reasoning via a psychology‑based triple, automatically generates and validates training data, and demonstrates over 10% accuracy gains on math, coding and scientific benchmarks for 7B and 32B models.

Meta‑Capability AlignmentModel Traininglarge language models
0 likes · 8 min read
Meta‑Capability Alignment: Psychologically Inspired Training to Endow Large Language Models with Stable Reasoning
Alibaba Cloud Developer
Alibaba Cloud Developer
May 26, 2025 · Artificial Intelligence

How Multi‑Agent Planning Boosts Copilot 3.0 with DeepSeek R1 GRPO Training

This article examines Copilot 3.0’s planning module, explains how DeepSeek R1’s GRPO reinforcement‑learning pipeline enables flexible multi‑agent orchestration, addresses the limitations of Copilot 2.0, and presents experimental results that show a 61% reduction in reasoning length and a 9% relative gain in accuracy.

AIModel TrainingMulti-Agent
0 likes · 14 min read
How Multi‑Agent Planning Boosts Copilot 3.0 with DeepSeek R1 GRPO Training
Baobao Algorithm Notes
Baobao Algorithm Notes
May 2, 2025 · Artificial Intelligence

Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models

This article analyzes whether reinforcement learning enhances large language model reasoning, compares findings from DeepSeek-Math, a Tsinghua‑Shanghai Jiao‑Tong paper, and Qwen3, and outlines practical training pipelines—including Seed‑Thinking‑v1.5, DeepSeek‑R1, Kimi‑K1.5, and Qwen3—that aim to endow LLMs with robust reasoning capabilities.

LLMModel Trainingartificial intelligence
0 likes · 12 min read
Do Reinforcement Learning Techniques Really Boost LLM Reasoning? A Deep Dive into Recent Models
21CTO
21CTO
Apr 7, 2025 · Artificial Intelligence

Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities

Meta's Llama 4 series introduces the Scout, Maverick, and Behemoth models—featuring Mixture‑of‑Experts architectures, unprecedented 10‑million‑token context windows, and state‑of‑the‑art performance across vision, language, and multimodal benchmarks—while emphasizing efficient training, open‑source availability, and robust safety safeguards.

AI SafetyLlama 4Mixture of Experts
0 likes · 14 min read
Llama 4 Unveiled: Breakthrough Multimodal Models Redefine AI Capabilities
DataFunTalk
DataFunTalk
Apr 6, 2025 · Artificial Intelligence

Meta Unveils Llama 4: New Multimodal AI Models with Mixture‑of‑Experts Architecture and 10 Million‑Token Context

Meta announced the Llama 4 series—Scout, Maverick and Behemoth—featuring multimodal capabilities, Mixture‑of‑Experts design, up to 10 million‑token context windows, and state‑of‑the‑art performance on STEM, multilingual and image benchmarks, with models now downloadable from llama.com and Hugging Face.

Llama 4Mixture of ExpertsModel Training
0 likes · 14 min read
Meta Unveils Llama 4: New Multimodal AI Models with Mixture‑of‑Experts Architecture and 10 Million‑Token Context
Baidu MEUX
Baidu MEUX
Mar 27, 2025 · Artificial Intelligence

How LoRA Supercharges AI‑Generated Seasonal Poetry Posters

This article details how the LoRA model was employed to enhance AI-generated seasonal poetry posters, covering project background, innovative gameplay, training methodology, dataset preparation, and the resulting benefits of fully automated visual creation that boosts user engagement and product AI capabilities.

AI creativityAI image generationLoRA
0 likes · 8 min read
How LoRA Supercharges AI‑Generated Seasonal Poetry Posters
Cognitive Technology Team
Cognitive Technology Team
Mar 6, 2025 · Artificial Intelligence

From Traditional Machine Learning to Deep Learning: A Comprehensive Guide to Algorithms, Feature Engineering, and Model Training

This article provides a step‑by‑step tutorial that walks readers through the fundamentals of traditional machine‑learning algorithms, feature‑engineering techniques, model training pipelines, evaluation metrics, and then advances to deep‑learning concepts such as MLPs, activation functions, transformers, and modern recommendation‑system models.

Deep LearningModel TrainingPython
0 likes · 63 min read
From Traditional Machine Learning to Deep Learning: A Comprehensive Guide to Algorithms, Feature Engineering, and Model Training
Tencent Technical Engineering
Tencent Technical Engineering
Feb 26, 2025 · Artificial Intelligence

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Thirteen engineers praise DeepSeek’s open‑source, reinforcement‑learning‑driven architecture—using FP8 storage and SFT‑free training—to deliver GPT‑4‑level reasoning at one‑twentieth the cost, enabling single‑GPU deployment, lowering barriers for academia and startups, and prompting notable market reactions that could democratize advanced AI.

AI cost reductionDeepSeekFP8
0 likes · 9 min read
Engineers' Perspectives on DeepSeek: Technical Innovations and Implications
Architect
Architect
Feb 20, 2025 · Artificial Intelligence

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

The article analyses recent breakthroughs such as OpenAI's o1, Long CoT, and test‑time search, arguing that enabling LLMs to perform self‑critique and reinforcement learning with long output sequences is essential for future AI performance, while warning against overly structured workflows.

AI researchIn‑Context RLLLM
0 likes · 12 min read
Why Long CoT and In‑Context RL Are the Next Frontier for LLMs
Architect
Architect
Feb 18, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Innovations and Architecture for High‑Performance Reasoning LLMs

The article explains how DeepSeek‑R1 advances large language model reasoning by releasing a lightweight distilled version, sharing a complete training pipeline—including pre‑training, supervised fine‑tuning, and reinforcement learning—introducing long‑chain reasoning data, a transitional inference model, and a comprehensive RL optimization that together yield strong mathematical and logical capabilities.

AIDeepSeekModel Training
0 likes · 10 min read
DeepSeek‑R1: Training Innovations and Architecture for High‑Performance Reasoning LLMs
Big Data Tech Team
Big Data Tech Team
Feb 18, 2025 · Artificial Intelligence

How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models

This article breaks down DeepSeek's LLM training pipeline, explaining the massive pre‑training phase, instruction fine‑tuning, reinforcement‑learning‑from‑human‑feedback, and the distinct roles of its V3 instruction model and R1 reasoning model, while also highlighting performance metrics and current limitations.

DeepSeekLLMModel Training
0 likes · 8 min read
How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models
Huolala Tech
Huolala Tech
Feb 14, 2025 · Artificial Intelligence

How AI‑Driven Loss Prevention Transforms Risk Management Across the Software Lifecycle

This article explains a comprehensive AI‑powered loss‑prevention framework that automatically identifies financial‑risk scenarios in both existing and new code, integrates model‑based detection into product, development, testing, and release stages, and continuously refines coverage through intelligent monitoring and rule enforcement.

AIModel TrainingSoftware Engineering
0 likes · 11 min read
How AI‑Driven Loss Prevention Transforms Risk Management Across the Software Lifecycle
JD Tech Talk
JD Tech Talk
Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekLLMModel Training
0 likes · 10 min read
DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations
JD Cloud Developers
JD Cloud Developers
Feb 13, 2025 · Artificial Intelligence

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

This article demystifies DeepSeek R1 by explaining key concepts such as online search integration and the R1 model, detailing its two‑phase training pipeline, core techniques like iterative data enhancement, and showcases practical reproductions, benchmark tests, and deployment examples for AI developers.

DeepSeekModel Trainingknowledge distillation
0 likes · 12 min read
Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments
Architect
Architect
Feb 12, 2025 · Artificial Intelligence

Can S‑Curve Theory Explain the Limits of Large‑Model Scaling Laws?

The article analyses how S‑shaped growth curves can model the apparent scaling laws of large language models, discusses the three phases of model development, proposes an ability‑density hypothesis, and explores future scenarios where scaling laws may plateau or shift.

AI growthAbility DensityModel Training
0 likes · 16 min read
Can S‑Curve Theory Explain the Limits of Large‑Model Scaling Laws?
DataFunSummit
DataFunSummit
Feb 10, 2025 · Artificial Intelligence

Intelligent Decision-Making Large Model ORLM: Research, Training Challenges, Commercialization, and Future Directions

This article presents the ORLM intelligent decision‑making large model, detailing how real‑world decision problems are formalized and solved, the training difficulties and data synthesis methods, the transition from academic research to commercial platforms, and future technical improvement plans.

AIDecision ModelingModel Training
0 likes · 10 min read
Intelligent Decision-Making Large Model ORLM: Research, Training Challenges, Commercialization, and Future Directions
Top Architect
Top Architect
Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM
0 likes · 12 min read
DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results
JavaEdge
JavaEdge
Feb 6, 2025 · Artificial Intelligence

Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost

The article explains the “impossible triangle” in Transformer training, showing how speed, model performance, and computational cost cannot all be optimized simultaneously, and uses analogies and real‑world examples like GPT‑4 to illustrate the necessary trade‑offs.

Deep LearningModel TrainingPerformance Tradeoff
0 likes · 7 min read
Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost
Architect
Architect
Feb 5, 2025 · Industry Insights

What Makes DeepSeek R1 a Game-Changer? Inside the AI Industry’s Latest Power Shift

An in‑depth recap of a five‑hour Lex Fridman podcast reveals DeepSeek’s breakthrough R1 model, its cost‑saving MoE and MLA techniques, the geopolitical chip export battle, market reactions, and broader AI industry trends, offering a comprehensive analysis of technology, economics, and future implications.

AI industryDeepSeekGeopolitics
0 likes · 14 min read
What Makes DeepSeek R1 a Game-Changer? Inside the AI Industry’s Latest Power Shift
DataFunSummit
DataFunSummit
Jan 25, 2025 · Artificial Intelligence

AI-Driven Next-Generation Sales: Project Overview, Core Technologies, System Deployment, and Future Outlook

This article explores how AI transforms next‑generation sales by detailing project background and goals, core technologies such as efficient sample generation, model training and evaluation, system deployment impact, practical case studies, challenges, solutions, and future directions across multiple industries.

AIModel TrainingSales Automation
0 likes · 25 min read
AI-Driven Next-Generation Sales: Project Overview, Core Technologies, System Deployment, and Future Outlook
Kuaishou Tech
Kuaishou Tech
Jan 24, 2025 · Artificial Intelligence

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

KwaiCoder-23BA4-v1 is a 23B wide MoE code‑completion model that achieves state‑of‑the‑art performance on HumanEval, BigCodeBench and Fill‑in‑Middle benchmarks by using high‑quality data, a cost‑effective training pipeline that combines model pruning, knowledge distillation and fine‑grained merging, and extensive ablation studies.

AIBenchmarkCode Generation
0 likes · 10 min read
KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling
AI Large Model Application Practice
AI Large Model Application Practice
Jan 9, 2025 · Artificial Intelligence

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

This article walks through the complete training cycle of a simple neural network—from random weight initialization and forward propagation with labeled data, through loss calculation and gradient‑based weight updates, to iterative epochs, average loss, and practical issues like gradient explosion and vanishing.

AIModel TrainingNeural Networks
0 likes · 11 min read
How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 27, 2024 · Artificial Intelligence

How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart

This guide walks developers through the entire lifecycle of Qwen2.5‑Coder—covering model sizes, training token expansion, resource requirements, fine‑tuning with SFT/DPO, evaluation on custom and public datasets, and one‑click deployment and compression on Alibaba Cloud's PAI‑QuickStart platform.

Code GenerationDeploymentLLM
0 likes · 15 min read
How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart
Test Development Learning Exchange
Test Development Learning Exchange
Nov 26, 2024 · Artificial Intelligence

Comprehensive Python Tutorial for Data Preprocessing, Feature Engineering, Model Training, Evaluation, and Deployment

This tutorial walks through consolidating the first ten days of learning by covering data preprocessing, feature engineering, model training with linear regression, decision tree, and random forest, model evaluation using cross‑validation, and finally saving and loading the best model, all illustrated with complete Python code examples.

Model TrainingPythondata preprocessing
0 likes · 9 min read
Comprehensive Python Tutorial for Data Preprocessing, Feature Engineering, Model Training, Evaluation, and Deployment
DataFunTalk
DataFunTalk
Nov 25, 2024 · Artificial Intelligence

2024 AI Development Report Summary by Fei‑Fei Li’s Team

The 2024 AI Development Report by Fei‑Fei Li’s team highlights rapid progress in model capabilities, rising training costs, dominant contributions from the US, China and Europe, emerging reliability challenges, and the broad economic, medical, and educational impacts of artificial intelligence.

2024AIEconomic Impact
0 likes · 12 min read
2024 AI Development Report Summary by Fei‑Fei Li’s Team
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 13, 2024 · Artificial Intelligence

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Cleaning SFT data for LLMs is surprisingly complex, as subtle JSON formatting variations, inconsistent markdown wrappers, intent settings, and unit handling can cause model inconsistencies, requiring unified standards, careful prompt design, and extensive manual review to ensure reliable training outputs.

JSON formattingLLM data cleaningModel Training
0 likes · 8 min read
Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls
Architecture and Beyond
Architecture and Beyond
Nov 2, 2024 · Artificial Intelligence

Step-by-Step Guide to Training a LoRA Model with Flux1_dev on ComfyUI

This tutorial walks programmers through preparing a GPU cloud environment, installing ComfyUI, downloading Flux1_dev models, integrating a custom LoRA, labeling generated images, and finally training the LoRA using ai‑toolkit, providing detailed commands, configuration tips, and practical cost estimates.

AI image generationComfyUIFlux
0 likes · 12 min read
Step-by-Step Guide to Training a LoRA Model with Flux1_dev on ComfyUI
DataFunSummit
DataFunSummit
Aug 12, 2024 · Artificial Intelligence

Design and Application of Xiaohongshu Heterogeneous Training and Inference Engine

This article presents a comprehensive overview of Xiaohongshu's heterogeneous training and inference engine, covering the challenges of model engineering, the design of elastic heterogeneous engines, future HPC training frameworks, AI compilation techniques, and a forward‑looking outlook on scalability and performance.

AIAI CompilationHPC
0 likes · 19 min read
Design and Application of Xiaohongshu Heterogeneous Training and Inference Engine
Python Programming Learning Circle
Python Programming Learning Circle
Aug 8, 2024 · Operations

Automating Python Notifications for Model Training, Data Transfer, and Financial Modeling via Email

This article explains how to use Python scripts, the email and smtplib libraries, and MIME components to automatically send progress and completion notifications for long‑running tasks such as model training, data uploads, and financial simulations, including code examples and configuration details.

Model TrainingNotificationPython
0 likes · 13 min read
Automating Python Notifications for Model Training, Data Transfer, and Financial Modeling via Email
Java Tech Enthusiast
Java Tech Enthusiast
Aug 1, 2024 · Artificial Intelligence

Apple Intelligence: Inside the New Apple Foundation Model

Apple Intelligence, an on‑device AI suite debuting with iOS 18.1 beta, centers on the Apple Foundation Model—a 3‑billion‑parameter on‑device LLM (and a larger undisclosed cloud version) trained on TPUs with novel RL algorithms and mixed‑precision quantization, delivering Siri, writing assistance, photo search, and benchmark performance that surpasses GPT‑4, though currently limited to paid developers.

AIApple IntelligenceModel Training
0 likes · 11 min read
Apple Intelligence: Inside the New Apple Foundation Model
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 25, 2024 · Artificial Intelligence

Demystifying Large Language Models: From ChatGPT Basics to Future Impact

This article walks readers through the fundamentals of large language models—explaining ChatGPT's architecture, training pipelines, required GPU hardware, industry deployment models, societal implications, and future industry trends—offering a cohesive framework for both newcomers and professionals.

AI ImpactAI fundamentalsGPU computing
0 likes · 22 min read
Demystifying Large Language Models: From ChatGPT Basics to Future Impact
Practical DevOps Architecture
Practical DevOps Architecture
May 30, 2024 · Artificial Intelligence

Eight‑Week LLM and Large Model Training Course Outline

This article outlines an eight‑week curriculum covering LLM evolution, PyTorch fundamentals, CUDA training, large‑model fine‑tuning, LangChain application development, cloud‑based quantization, industry case studies, and a recruitment session, providing video resources for each topic.

AIFine-tuningLLM
0 likes · 5 min read
Eight‑Week LLM and Large Model Training Course Outline
DataFunTalk
DataFunTalk
May 21, 2024 · Big Data

Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights

This article details how Alluxio was adopted to replace NAS in autonomous driving model training, describing the data closed‑loop workflow, the challenges of the previous system, Alluxio's architectural benefits, deployment strategies across single and multiple data centers, functional and performance testing, operational tuning, and the resulting cost and efficiency gains.

AlluxioModel TrainingPerformance Optimization
0 likes · 15 min read
Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights
DataFunTalk
DataFunTalk
May 18, 2024 · Artificial Intelligence

Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions

This article details the background, goals, and evolution of Tencent's FinTech AI development platform, outlines the technical challenges faced in feature engineering, model training, and inference services, and presents the comprehensive solutions and future plans implemented to improve efficiency, stability, and scalability.

Cloud NativeFinTechInference
0 likes · 13 min read
Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions
Baidu Tech Salon
Baidu Tech Salon
May 15, 2024 · Artificial Intelligence

Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM

Baidu Baige’s AIAK‑LLM suite accelerates large‑model training and inference by boosting Model FLOPS Utilization through techniques such as TP communication overlap, hybrid recompute, zero‑offload, automatic parallel‑strategy search, multi‑chip support, and inference‑specific optimizations, achieving over 60 % speedup and seamless Hugging Face integration.

AI InfrastructureAIAK-LLMBaidu Baige
0 likes · 26 min read
Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM
DataFunTalk
DataFunTalk
May 10, 2024 · Artificial Intelligence

GPU Performance Optimization Practices for Tencent PCG Recommendation Model Training Framework

This article presents a comprehensive overview of Tencent PCG's GPU‑based recommendation model training framework, detailing why GPU adoption is essential, the hardware and software challenges faced, the multi‑level data architecture, pipeline design, and a series of network, storage, and compute optimizations, followed by future directions.

Distributed TrainingGPUModel Training
0 likes · 13 min read
GPU Performance Optimization Practices for Tencent PCG Recommendation Model Training Framework
Architects' Tech Alliance
Architects' Tech Alliance
Apr 25, 2024 · Industry Insights

What China’s AI Labs Learned from Scaling Domestic Large‑Model Training

The article analyzes the computational characteristics and system challenges of training large AI models on domestic platforms, examines framework parallelism and future algorithms, and proposes six strategic measures—including scaling compute, improving data management, building a national R&D team, and boosting AI‑chip investment—to accelerate China’s AI leadership.

AI InfrastructureModel Trainingdomestic AI
0 likes · 5 min read
What China’s AI Labs Learned from Scaling Domestic Large‑Model Training
DaTaobao Tech
DaTaobao Tech
Apr 19, 2024 · Artificial Intelligence

AI‑Driven Aesthetic Evaluation for E‑commerce Image Generation

The article outlines a systematic method for defining, training, and deploying AI‑driven aesthetic standards to evaluate and improve e‑commerce image generation on Taobao, detailing a four‑step workflow, multimodal model architecture, scoring criteria, validation processes, and future plans for style libraries and an AI‑PaaS offering.

AIAesthetic EvaluationModel Training
0 likes · 10 min read
AI‑Driven Aesthetic Evaluation for E‑commerce Image Generation
Top Architect
Top Architect
Apr 18, 2024 · Artificial Intelligence

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

This article provides a comprehensive overview of Transformer models, covering their attention-based architecture, encoder-decoder structure, training procedures including teacher forcing, inference workflow, advantages over RNNs, and various applications in natural language processing such as translation, summarization, and classification.

Attention MechanismDeep LearningInference
0 likes · 11 min read
Understanding Transformers: Architecture, Attention Mechanism, Training and Inference
Data Thinking Notes
Data Thinking Notes
Apr 11, 2024 · Artificial Intelligence

How Financial Institutions Are Building Their Own Large Language Models

This article explores how the finance sector is creating specialized large language models—covering the shift from generic to domain‑specific models, training innovations, evaluation methods, and real‑world applications such as marketing, customer service, risk control, and operational analytics.

ApplicationsModel Trainingfinance AI
0 likes · 16 min read
How Financial Institutions Are Building Their Own Large Language Models
Sohu Tech Products
Sohu Tech Products
Mar 27, 2024 · Artificial Intelligence

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

NVIDIA’s comprehensive LLM ecosystem combines the full‑stack NeMo Framework for data curation, distributed training, fine‑tuning, inference acceleration with TensorRT‑LLM and Triton, plus Retrieval‑Augmented Generation and Guardrails, enabling efficient, low‑latency, knowledge‑grounded model deployment across clusters.

AI accelerationModel TrainingNeMo Framework
0 likes · 16 min read
NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions
Architect
Architect
Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

Attention MechanismInferenceModel Training
0 likes · 14 min read
Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training
DataFunSummit
DataFunSummit
Mar 13, 2024 · Artificial Intelligence

Overview of Vivo BlueLM: Evolution, Training Challenges, Deployment, and Product Applications

This article presents a comprehensive overview of Vivo's BlueLM large language model, covering its historical evolution, training pipeline and data challenges, algorithmic innovations, safety measures, edge‑device optimization, product deployments such as BlueLM Mini‑V and BlueQianXun, and insights from a detailed Q&A session.

AI productEdge ComputingModel Training
0 likes · 17 min read
Overview of Vivo BlueLM: Evolution, Training Challenges, Deployment, and Product Applications
DataFunSummit
DataFunSummit
Jan 14, 2024 · Artificial Intelligence

Large Language Model Innovations for the Financial Industry: From General to Finance‑Specific Models, Training Techniques, Evaluation Methods, and Real‑World Applications

This article details how the financial sector is adopting large language models, describing the shift from generic to finance‑specific models, the technical challenges and cost considerations, the XuanYuan model releases, novel training and evaluation approaches, and a range of practical applications such as marketing, service, operations, office assistance, and risk control.

AIApplicationsModel Training
0 likes · 17 min read
Large Language Model Innovations for the Financial Industry: From General to Finance‑Specific Models, Training Techniques, Evaluation Methods, and Real‑World Applications
DataFunSummit
DataFunSummit
Jan 13, 2024 · Artificial Intelligence

Large Model Applications in Automotive Industrialization: Practices, Architecture, and Case Studies

This presentation explores the development of ChatGPT, the underlying principles of large language models, their role in enabling new industrialization, detailed NIO automotive AI platform architecture, data‑model‑agent closed‑loops, intelligent inspection solutions, and practical case studies such as G8D Agents, providing a comprehensive view of large‑model deployment in the automotive sector.

AI agentsIndustrial AIModel Training
0 likes · 13 min read
Large Model Applications in Automotive Industrialization: Practices, Architecture, and Case Studies
Taobao Design
Taobao Design
Dec 14, 2023 · Artificial Intelligence

How to Train a Multi‑Stage AI Model for a Brand Mascot – Tmall Case Study

This article explores the challenges of using AI image generators for brand IP, compares Midjourney and Stable Diffusion results, and presents a step‑by‑step multi‑layer model training workflow—including dataset creation, training optimization, and practical tips—to achieve a more expressive and consistent Tmall mascot.

AIMidjourneyModel Training
0 likes · 8 min read
How to Train a Multi‑Stage AI Model for a Brand Mascot – Tmall Case Study
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 8, 2023 · Big Data

How Big Data and AI Converge: Insights from Alibaba Cloud’s 2023 Conference

The talk outlines the evolution from model‑centric to data‑centric AI development, explains Alibaba Cloud’s integrated big data‑AI platform, showcases real‑world use cases like knowledge‑base QA and personalized recommendation, and details the underlying cloud‑native services that enable seamless data and AI collaboration.

AI EngineeringModel Training
0 likes · 16 min read
How Big Data and AI Converge: Insights from Alibaba Cloud’s 2023 Conference
Ximalaya Technology Team
Ximalaya Technology Team
Oct 9, 2023 · Artificial Intelligence

DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud

Ximalaya AI Cloud leverages DeepRec’s Embedding Variable to elastically manage high‑dimensional sparse features with low collision, supporting admission/eviction, multi‑level storage and minute‑level incremental model updates, which together boost GPU utilization, halve training time and improve recommendation CTR by 2‑3 % while maintaining latency.

AI cloudDeepRecKubernetes
0 likes · 13 min read
DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud
Tencent Tech
Tencent Tech
Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI SafetyData QualityKnowledge Retrieval
0 likes · 8 min read
Why Do Large Language Models Hallucinate and How to Reduce It?
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 13, 2023 · Artificial Intelligence

Pai‑Megatron‑Patch: Design Principles, Key Features, and End‑to‑End Usage for Large Language Model Training

This article introduces the open‑source Pai‑Megatron‑Patch tool from Alibaba Cloud, explains its non‑intrusive patch architecture, enumerates supported models and features such as weight conversion, Flash‑Attention 2.0, FP8 training with Transformer Engine, and provides detailed command‑line examples for model conversion, pre‑training, supervised fine‑tuning, inference, and RLHF reinforcement learning pipelines.

Deep LearningFP8LLM
0 likes · 19 min read
Pai‑Megatron‑Patch: Design Principles, Key Features, and End‑to‑End Usage for Large Language Model Training
37 Interactive Technology Team
37 Interactive Technology Team
Aug 23, 2023 · Artificial Intelligence

LoRA Model Training Guide for Stable Diffusion: Comparison, Workflow, and Tips

This guide compares Stable Diffusion fine‑tuning methods, shows why LoRA offers the best size‑and‑speed trade‑off, and provides a step‑by‑step workflow—from dataset collection and preprocessing to parameter tuning, 20‑minute training on a single GPU, and practical tips for successful custom model generation.

AI artDreamBoothLoRA
0 likes · 9 min read
LoRA Model Training Guide for Stable Diffusion: Comparison, Workflow, and Tips
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 17, 2023 · Artificial Intelligence

Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training

This article introduces the YOLOv8 object‑detection framework on the Ultralytics platform, covering environment setup, command‑line and Python APIs for inference, model‑file options, result interpretation, data annotation, training procedures, and exporting models to various deployment formats.

Computer VisionModel TrainingPython
0 likes · 14 min read
Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training
Tencent Cloud Developer
Tencent Cloud Developer
Jul 19, 2023 · Artificial Intelligence

Build a Full‑Scale LLM from Scratch in 61 Lines of Python

This step‑by‑step tutorial shows how to set up a GPU environment, prepare custom text data, train a tokenizer, configure and train a GPT‑2‑based large language model, test its generation, and run the entire pipeline using only 61 lines of Python code.

AIDockerGPT-2
0 likes · 10 min read
Build a Full‑Scale LLM from Scratch in 61 Lines of Python
IT Services Circle
IT Services Circle
Jun 19, 2023 · Artificial Intelligence

AI Pollution: How Generated Content Threatens the Internet and Model Training

The article examines how AI-generated misinformation spreads across platforms—from misleading answers on Bing and Stack Overflow to fabricated news stories—highlighting the resulting contamination of online information, the risks to model training, and emerging efforts to detect and curb such low‑quality AI output.

AIChatGPTContent Pollution
0 likes · 8 min read
AI Pollution: How Generated Content Threatens the Internet and Model Training
Tencent Cloud Developer
Tencent Cloud Developer
Apr 20, 2023 · Artificial Intelligence

Master Stable Diffusion: From Hardware Setup to Advanced Prompt Engineering

This comprehensive guide walks you through the hardware requirements, environment deployment, key parameters, prompt techniques, ControlNet integration, model download and installation, as well as style and character training for Stable Diffusion, providing practical code snippets and visual examples for each step.

AI image generationControlNetGPU deployment
0 likes · 38 min read
Master Stable Diffusion: From Hardware Setup to Advanced Prompt Engineering
Python Programming Learning Circle
Python Programming Learning Circle
Mar 21, 2023 · Artificial Intelligence

Why Replicating ChatGPT in China Demands Massive AI Infrastructure and Cloud Computing

The article explains that reproducing ChatGPT in China is not just a matter of funding but requires extensive expertise in large‑scale language model training, massive compute resources, optimized cloud infrastructure, and deep AI research, as demonstrated by Alibaba's DAMO Academy efforts.

AI InfrastructureChatGPTModel Training
0 likes · 10 min read
Why Replicating ChatGPT in China Demands Massive AI Infrastructure and Cloud Computing
360 Tech Engineering
360 Tech Engineering
Mar 17, 2023 · Artificial Intelligence

Understanding ChatGPT: OpenAI’s Development, Model Evolution, and Training Techniques

This article provides an overview of ChatGPT’s rapid rise, OpenAI’s founding, the evolution of GPT models up to GPT‑3, the data‑driven training process, model capabilities and limitations, and practical guidance for users, highlighting the interplay between open‑source research and commercial deployment.

ChatGPTGPT-3Model Training
0 likes · 14 min read
Understanding ChatGPT: OpenAI’s Development, Model Evolution, and Training Techniques
DataFunTalk
DataFunTalk
Feb 15, 2023 · Big Data

Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training

This article summarizes how Ant Group introduced Alluxio to address storage I/O, capacity, and latency challenges in large‑scale model training, detailing stability improvements through worker‑register follower and master migration, performance gains via follower‑only reads, and horizontal scaling using metadata sharding and multi‑cluster deployment.

AlluxioBig DataModel Training
0 likes · 15 min read
Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 11, 2023 · Artificial Intelligence

How Baidu Cloud Powers End-to-End Autonomous Driving Data Ops and AI

This article outlines Baidu Intelligent Cloud's comprehensive, low‑cost solution for autonomous‑driving data pipelines—from road data collection and compliance, through annotation, management, and model training, to simulation—highlighting the platform's tools, services, and security measures that accelerate development.

AIData ManagementModel Training
0 likes · 18 min read
How Baidu Cloud Powers End-to-End Autonomous Driving Data Ops and AI
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoEModel Training
0 likes · 26 min read
How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 5, 2023 · Artificial Intelligence

How Baidu’s AI IaaS Supercharges Autonomous Driving: 5× Data Speed & 391% Model Gains

The talk outlines Baidu’s Baige AI IaaS solution for autonomous driving, detailing a low‑cost, high‑efficiency cloud stack that accelerates data access fivefold, boosts model training speed up to 391 %, cuts inference latency by 90 %, reduces simulation costs by 60 %, and explains the underlying storage, compute, container and GPU virtualization technologies.

AI IaaSModel Trainingautonomous driving
0 likes · 17 min read
How Baidu’s AI IaaS Supercharges Autonomous Driving: 5× Data Speed & 391% Model Gains
Laiye Technology Team
Laiye Technology Team
Dec 16, 2022 · Artificial Intelligence

Efficient Production of Scene-specific OCR Models Using an AI Platform

This article explains how a unified AI platform enables rapid, data‑driven creation, training, deployment, and evaluation of OCR models for visually distinct text regions such as seals, meter readings, license plates, and VIN codes, while minimizing hardware and annotation costs.

AI PlatformComputer VisionKubeflow
0 likes · 7 min read
Efficient Production of Scene-specific OCR Models Using an AI Platform
ByteDance Terminal Technology
ByteDance Terminal Technology
Sep 1, 2022 · Artificial Intelligence

Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion

This article presents a hybrid pipeline combining traditional computer vision techniques and deep learning models to automatically extract and verify text background colors in UI automation screenshots, effectively addressing challenges like limited training data and complex borders to significantly reduce manual inspection costs while achieving high accuracy and robustness in production environments.

Automated TestingComputer VisionDeep Learning
0 likes · 10 min read
Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Aug 29, 2022 · Artificial Intelligence

Building Yanxuan Machine Learning Platform: Architecture and Implementation

Yanxuan built a Kubeflow‑based machine‑learning platform that unifies data preprocessing, feature engineering, model training, validation, and deployment, using Smart‑jobs, Smart‑Infer, Smart‑backend, Airflow pipelines, Jupyter notebooks, and Istio‑enhanced inference services to boost algorithm engineers’ efficiency and integrate with Kubernetes, HDFS, and Hive.

Airflow orchestrationAlgorithm DevelopmentInference Service
0 likes · 14 min read
Building Yanxuan Machine Learning Platform: Architecture and Implementation
Python Programming Learning Circle
Python Programming Learning Circle
Mar 31, 2022 · Artificial Intelligence

Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices

This article provides a thorough collection of commonly used PyTorch code snippets covering environment setup, reproducibility, GPU configuration, tensor manipulation, model building, data preprocessing, training and evaluation loops, custom loss functions, regularization techniques, learning‑rate scheduling, checkpointing, and practical tips for efficient deep‑learning development.

Deep LearningGPUModel Training
0 likes · 37 min read
Comprehensive PyTorch Code Snippets: Configuration, Tensor Operations, Model Definition, Training, and Best Practices