Tag

training

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Apr 2, 2025 · Artificial Intelligence

DeepSeek-VL2 Multimodal Model: Architecture, Training, and Code Walkthrough

DeepSeek‑VL2 is a state‑of‑the‑art multimodal model built on a Mixture‑of‑Experts architecture that combines a SigLIP‑L vision encoder with dynamic tiling, a two‑layer VL adaptor, and a DeepSeek‑MoE language model using Multi‑head Latent Attention, trained in three stages on diverse visual‑language and text data, and achieving strong results on benchmarks such as DocVQA and TextVQA, with full implementation and inference code available in PaddleMIX.

CodeDeepSeek-VL2Inference
0 likes · 36 min read
DeepSeek-VL2 Multimodal Model: Architecture, Training, and Code Walkthrough
Code Mala Tang
Code Mala Tang
Mar 1, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How Can We Fix It?

This article explains why large language models produce plausible‑looking but false information, traces the problem to the supervised fine‑tuning stage, and outlines mitigation techniques such as knowledge interrogation, RLHF, and tool‑augmented search to reduce hallucinations.

LLMRLHFWeb Search
0 likes · 12 min read
Why Do Large Language Models Hallucinate and How Can We Fix It?
Architect
Architect
Feb 21, 2025 · Artificial Intelligence

DeepSeek Model Innovations: Architecture, Training Methods, and Performance Evaluation

This article reviews DeepSeek's recent breakthroughs, including the MLA attention redesign, GRPO alignment algorithm, MoE enhancements, multi‑stage training pipelines (SFT, RL, preference tuning, distillation), and comparative performance against GPT‑4o‑Mini and Llama 3.1, highlighting both strengths and remaining challenges.

DeepSeekMixture of Expertsarchitecture
0 likes · 16 min read
DeepSeek Model Innovations: Architecture, Training Methods, and Performance Evaluation
Practical DevOps Architecture
Practical DevOps Architecture
Feb 20, 2025 · Artificial Intelligence

Training MiniDeepSeek V3+R1 from Scratch: Full-Scale Large Model Technical Practice for 2025

This tutorial series provides a step‑by‑step technical guide to training, deploying, and fine‑tuning the MiniDeepSeek V3+R1 large language model, covering model performance, open‑source details, API usage, parameter explanation, multi‑turn chatbot construction, function calling, integration with Open WebUI, GraphRAG, Swarm, and various deployment and optimization techniques.

AIMiniDeepSeekTutorial
0 likes · 4 min read
Training MiniDeepSeek V3+R1 from Scratch: Full-Scale Large Model Technical Practice for 2025
JD Retail Technology
JD Retail Technology
Aug 30, 2024 · Artificial Intelligence

GPU Optimization Practices for Training and Inference in JD Advertising Recommendation Systems

The article details JD Advertising's technical challenges and solutions for large‑scale sparse recommendation models, describing GPU‑focused storage, compute and I/O optimizations for both training and low‑latency inference, including distributed pipelines, heterogeneous deployment, batch aggregation, multi‑stream execution, and compiler extensions.

Distributed SystemsGPU optimizationInference
0 likes · 13 min read
GPU Optimization Practices for Training and Inference in JD Advertising Recommendation Systems
DataFunTalk
DataFunTalk
Jul 26, 2024 · Artificial Intelligence

Llama 3: Open‑source Large Language Model Technical Report and Evaluation

This comprehensive technical report details the development, architecture, training methodology, extensive benchmark evaluations, safety measures, and inference optimizations of Meta's open‑source Llama 3 large language model series, covering models up to 405 billion parameters and supporting multilingual, multimodal, and tool‑use capabilities.

AILlamabenchmark
0 likes · 115 min read
Llama 3: Open‑source Large Language Model Technical Report and Evaluation
Architects' Tech Alliance
Architects' Tech Alliance
Jun 22, 2024 · Artificial Intelligence

Rising Compute Demand of Generative AI Models and GPU Accelerator Trends in 2024

The article analyzes how generative AI models from GPT‑1 to the upcoming GPT‑5 are driving exponential growth in compute requirements, prompting massive cloud capital expenditures and intense competition among GPU vendors such as NVIDIA, AMD, Google, and emerging domestic chip makers, while also highlighting interconnect innovations and cost‑effective solutions.

AIAcceleratorsCompute
0 likes · 12 min read
Rising Compute Demand of Generative AI Models and GPU Accelerator Trends in 2024
IT Services Circle
IT Services Circle
May 2, 2024 · Artificial Intelligence

LLM.c: A 1000‑Line C Implementation for Training GPT‑2

Andrej Karpathy’s LLM.c project demonstrates how a compact, pure‑C (and CUDA) codebase of roughly 1000 lines can train a GPT‑2 model, covering data preparation, memory management, layer implementations, compilation, and practical tips for running and testing the model on CPUs and GPUs.

AIC++CUDA
0 likes · 10 min read
LLM.c: A 1000‑Line C Implementation for Training GPT‑2
DataFunSummit
DataFunSummit
Feb 11, 2024 · Artificial Intelligence

GPU-Accelerated Model Service and Optimization Practices at Xiaohongshu

This article details Xiaohongshu's end‑to‑end GPU‑based transformation of its recommendation and search models, covering background, model characteristics, training and inference frameworks, system‑level and GPU‑level optimizations, compilation tricks, hardware upgrades, and future directions for large‑scale machine‑learning infrastructure.

GPUInferencelarge-scale systems
0 likes · 18 min read
GPU-Accelerated Model Service and Optimization Practices at Xiaohongshu
DataFunTalk
DataFunTalk
Dec 1, 2023 · Artificial Intelligence

GPU‑Driven Model Service and Optimization Practices in Xiaohongshu's Search Scenario

This article details Xiaohongshu's end‑to‑end GPU‑centric transformation for search‑related machine‑learning models, covering model characteristics, training and inference frameworks, system‑level GPU and CPU optimizations, multi‑card and compilation techniques, and future directions for scaling large sparse and dense models.

GPU optimizationInferenceXiaohongshu
0 likes · 16 min read
GPU‑Driven Model Service and Optimization Practices in Xiaohongshu's Search Scenario
Architects' Tech Alliance
Architects' Tech Alliance
Nov 19, 2023 · Artificial Intelligence

NVIDIA H100 vs L40S: AI‑Focused GPU Comparison and Practical Alternatives

This article compares NVIDIA's high‑end AI GPUs—H100, A100, and the newer L40S—detailing their specifications, performance trade‑offs, pricing, availability, and suitability for training and inference workloads, while highlighting why L40S can be a cost‑effective alternative for many enterprises.

AIGPUH100
0 likes · 10 min read
NVIDIA H100 vs L40S: AI‑Focused GPU Comparison and Practical Alternatives
Architects Research Society
Architects Research Society
Jun 8, 2023 · Information Security

From Flight Training to Industrial Control Systems Cybersecurity: Lessons from SANS ICS612

The article uses a CEO’s one‑hour flight lesson for ten staff as a metaphor to illustrate why hands‑on, relevant experience is essential for effective industrial control systems (ICS) cybersecurity training, and describes the structure and objectives of the SANS ICS612 course.

Hands‑on ExperienceICSSANS
0 likes · 13 min read
From Flight Training to Industrial Control Systems Cybersecurity: Lessons from SANS ICS612
DataFunTalk
DataFunTalk
Mar 31, 2023 · Artificial Intelligence

Estimating the Resource and Cost Requirements for Large Language Model Training and Inference

The article analyses the computational resources, hardware costs, and human investment needed to train and serve large language models such as GPT‑3, discusses practical cost calculations, highlights the challenges faced by Chinese AI teams, and argues for sustained, long‑term funding to achieve meaningful breakthroughs.

AI infrastructureChina AIInference
0 likes · 14 min read
Estimating the Resource and Cost Requirements for Large Language Model Training and Inference
DataFunTalk
DataFunTalk
Nov 2, 2021 · Artificial Intelligence

Optimizing AI Platform Resource Efficiency: Scheduling Strategies for Deep Learning Inference and Training

The article outlines a technical exchange hosted by 58.com AI Lab and Tianjin University that discusses high‑efficiency AI computing, resource‑aware scheduling for both online inference and offline training, and methods to mitigate GPU under‑utilization and gray‑interference in distributed deep‑learning platforms.

AIGPU utilizationInference
0 likes · 4 min read
Optimizing AI Platform Resource Efficiency: Scheduling Strategies for Deep Learning Inference and Training
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 18, 2021 · Databases

How to Build a Professional DBA Operations Team: Infrastructure, Standards, Training, Knowledge Base, and Culture

The article explains how to construct an effective DBA operations team by focusing on reusable infrastructure, clear team standards, a structured training system, a comprehensive knowledge base, and a positive team atmosphere, providing practical tools and methods for each aspect.

DBADatabase Operationsinfrastructure
0 likes · 4 min read
How to Build a Professional DBA Operations Team: Infrastructure, Standards, Training, Knowledge Base, and Culture
JD Tech Talk
JD Tech Talk
Nov 16, 2020 · Artificial Intelligence

Practical Guide to Deploying Federated Learning: Architecture, Deployment, Training, and Inference

This article provides a comprehensive overview of federated learning engineering, covering deployment via Docker containers, the design of training and inference frameworks, key services such as communication, training, model management, and registration, and practical considerations for scaling and reliability in production environments.

AIDeploymentDocker
0 likes · 11 min read
Practical Guide to Deploying Federated Learning: Architecture, Deployment, Training, and Inference
Efficient Ops
Efficient Ops
Nov 7, 2016 · Operations

How to Train New SREs Effectively: Proven Practices and Playbooks

This article outlines a systematic approach to onboarding and training new Site Reliability Engineers, covering trust building, readiness assessment, diverse learning methods, structured curricula, on‑call milestones, project‑focused work, reverse‑engineering skills, statistical thinking, and improvisation techniques to develop high‑performing SRE teams.

Reverse EngineeringSREon-call
0 likes · 17 min read
How to Train New SREs Effectively: Proven Practices and Playbooks
Architects Research Society
Architects Research Society
Sep 19, 2016 · Information Security

Recommended Books, Training, and Conferences for Industrial Control Systems Cybersecurity

This guide curates essential books, professional training courses, and major conferences for industrial control systems cybersecurity, offering insights into historical context, technical security practices, and community engagement to help practitioners deepen their knowledge and connect with the field.

ICS securityconferencescybersecurity resources
0 likes · 10 min read
Recommended Books, Training, and Conferences for Industrial Control Systems Cybersecurity
Qunar Tech Salon
Qunar Tech Salon
Aug 19, 2016 · Artificial Intelligence

Deep Learning Anti‑Scam Guide: A Non‑Technical Overview of Neural Networks, Training, and Practical Tips

This article provides a humorous yet informative, non‑mathematical guide to deep learning, covering neural network basics, layer addition, training methods, back‑propagation, unsupervised pre‑training, regularization, ResNet shortcuts, GPU computation, framework choices, and practical advice for applying deep learning to industrial data.

AIGPUPu-Learning
0 likes · 26 min read
Deep Learning Anti‑Scam Guide: A Non‑Technical Overview of Neural Networks, Training, and Practical Tips