Tagged articles
197 articles
Page 1 of 2
SuanNi
SuanNi
May 6, 2026 · Artificial Intelligence

Deploy RecBole on a GPU Cloud to Learn Recommendation Algorithms

This guide explains how to launch the RecBole recommendation system image on the SumW GPU cloud, covering its key features, required setup steps, dependency installation tips, and a one‑line command to run a baseline model on an MLU accelerator.

GPU cloudMLUPyTorch
0 likes · 4 min read
Deploy RecBole on a GPU Cloud to Learn Recommendation Algorithms
PaperAgent
PaperAgent
Apr 21, 2026 · Artificial Intelligence

OpenMythos: Rebuilding Claude Mythos with Recursive Transformers and MoE

OpenMythos is an open‑source PyTorch reimplementation of Anthropic's Claude Mythos that uses a mixed‑expert routed recurrent Transformer, introduces Recursive Depth Transformers, Multi‑Latent Attention, and several stability mechanisms, and demonstrates parameter‑efficient scaling backed by empirical studies.

AI ArchitectureClaude MythosMoE
0 likes · 6 min read
OpenMythos: Rebuilding Claude Mythos with Recursive Transformers and MoE
AI Explorer
AI Explorer
Apr 1, 2026 · Artificial Intelligence

Google Open‑Sources TimesFM: A Foundation Model for Plug‑and‑Play Time‑Series Forecasting

Google’s open‑source TimesFM is a decoder‑only Transformer foundation model that delivers plug‑and‑play time‑series forecasting with zero‑shot accuracy, larger context windows, quantile predictions, and a simple Hugging Face API, making it suitable for retail, energy, finance, monitoring, and IoT use cases.

Hugging FacePyTorchTimesFM
0 likes · 7 min read
Google Open‑Sources TimesFM: A Foundation Model for Plug‑and‑Play Time‑Series Forecasting
Tech Musings
Tech Musings
Mar 6, 2026 · Artificial Intelligence

How to Deploy Qwen3-8B on WSL2 with 4‑Bit Quantization and Resource Limits

This article details a step‑by‑step guide for setting up the Qwen3‑8B large language model on a Windows 11 system using WSL2, covering hardware specs, CUDA configuration, 4‑bit quantization with BitsAndBytes, SDPA attention optimization, CPU offload, and resource‑limiting tricks to achieve smooth inference performance.

4-bit quantizationCUDA optimizationPyTorch
0 likes · 10 min read
How to Deploy Qwen3-8B on WSL2 with 4‑Bit Quantization and Resource Limits
DeepHub IMBA
DeepHub IMBA
Mar 1, 2026 · Artificial Intelligence

Demystifying VAE: From Probabilistic Encoding to Latent Space Regularization

This article walks through the fundamentals of variational autoencoders, explaining why they are needed, detailing their three core components, loss formulation, PyTorch implementation, training loop, and multiple inference modes such as anomaly detection, data generation, conditional generation, latent space manipulation, and data imputation.

Conditional VAEGenerative ModelsLatent Space
0 likes · 15 min read
Demystifying VAE: From Probabilistic Encoding to Latent Space Regularization
Data STUDIO
Data STUDIO
Feb 25, 2026 · Artificial Intelligence

Build a Large Language Model from Scratch with PyTorch—No Libraries, No Shortcuts

This guide walks you through building, training, and fine‑tuning a Transformer‑based large language model entirely from scratch using PyTorch, covering tokenization, self‑attention, multi‑head attention, positional encoding, model architecture, data preparation, training loops, and fine‑tuning on custom lyrics.

Fine-tuningGPTLLM
0 likes · 43 min read
Build a Large Language Model from Scratch with PyTorch—No Libraries, No Shortcuts
AI Cyberspace
AI Cyberspace
Feb 11, 2026 · Artificial Intelligence

From RNNs to LSTMs and GRUs: A Hands‑On Guide to Sequence Modeling in PyTorch

This tutorial explains the nature of sequential data, why traditional feed‑forward networks struggle with it, and how recurrent architectures such as RNN, LSTM, and GRU capture temporal dependencies, complete with mathematical foundations, training algorithms, and full PyTorch implementations for sentiment analysis, text generation, and encoder‑decoder models.

Encoder-DecoderGRULSTM
0 likes · 57 min read
From RNNs to LSTMs and GRUs: A Hands‑On Guide to Sequence Modeling in PyTorch
Data Party THU
Data Party THU
Feb 1, 2026 · Artificial Intelligence

How Tiny Perturbations Can Fool 95% Accurate Image Classifiers

Despite achieving over 95% accuracy on ImageNet, popular models like ResNet, VGG, and EfficientNet can be easily misled by carefully crafted adversarial examples using FGSM, revealing deep learning’s inherent vulnerability and prompting the need for robust defense strategies.

FGSMImage ClassificationPyTorch
0 likes · 11 min read
How Tiny Perturbations Can Fool 95% Accurate Image Classifiers
JD Cloud Developers
JD Cloud Developers
Jan 30, 2026 · Artificial Intelligence

Scaling Generative Recommendation: Inside JD’s 9N-LLM Multi‑Framework Training Engine

This article details JD Retail’s 9N-LLM unified training engine, which integrates TensorFlow and PyTorch across GPU and NPU hardware to tackle the massive data, model size, and reinforcement‑learning complexities of generative recommendation, offering concrete components, performance benchmarks, and future directions.

GPU/NPUPyTorchTensorFlow
0 likes · 26 min read
Scaling Generative Recommendation: Inside JD’s 9N-LLM Multi‑Framework Training Engine
21CTO
21CTO
Jan 26, 2026 · Artificial Intelligence

What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements

PyTorch 2.10 introduces extensive upgrades for AMD ROCm, Intel XPU, and NVIDIA CUDA, adds new Torch XPU APIs, expands Python 3.14 support, and brings performance‑focused improvements such as fused kernels and enhanced quantization, all available via the official GitHub release.

CUDADeep LearningGPU
0 likes · 4 min read
What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements
AI Algorithm Path
AI Algorithm Path
Jan 21, 2026 · Artificial Intelligence

Understanding Vector Similarity in Machine Learning: A Plain‑Language Guide

The article explains key vector similarity measures—dot product, cosine similarity, and L1/L2 distances—illustrates their geometric meanings, compares their behavior with concrete examples and PyTorch/Numpy code, and discusses when to prefer each metric in machine‑learning tasks.

Cosine SimilarityL1 distanceL2 distance
0 likes · 8 min read
Understanding Vector Similarity in Machine Learning: A Plain‑Language Guide
Data Party THU
Data Party THU
Jan 18, 2026 · Artificial Intelligence

Unlocking 3D Scene Synthesis: A Deep Dive into Neural Radiance Fields (NeRF)

This article explains the core principles of Neural Radiance Fields, detailing how a fully‑connected network maps 5‑D coordinates to color and density, the role of positional encoding and hierarchical sampling, and provides a complete PyTorch implementation with training and rendering examples.

3D Scene RepresentationHierarchical SamplingNeRF
0 likes · 18 min read
Unlocking 3D Scene Synthesis: A Deep Dive into Neural Radiance Fields (NeRF)
Fun with Large Models
Fun with Large Models
Jan 12, 2026 · Artificial Intelligence

Why You Should Master Large‑Model Training: A Full‑Process Practical Guide

The article explains why mastering large‑model training is crucial for professionals, researchers, and enterprises, outlines the end‑to‑end pipeline—from data preparation and pre‑training to instruction fine‑tuning and RLHF alignment—compares training with RAG, and presents a structured learning roadmap.

AI agentsPyTorchRAG
0 likes · 14 min read
Why You Should Master Large‑Model Training: A Full‑Process Practical Guide
Data Party THU
Data Party THU
Dec 20, 2025 · Artificial Intelligence

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

This guide walks you through 20 fundamental PyTorch concepts—including tensor creation, operations, autograd, model building, data loading, GPU acceleration, and best‑practice tricks—providing clear code snippets and step‑by‑step explanations so you can quickly prototype, train, and deploy neural networks.

Deep LearningGPU AccelerationModel Training
0 likes · 16 min read
Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment
Data STUDIO
Data STUDIO
Dec 9, 2025 · Artificial Intelligence

20 Core PyTorch Concepts to Accelerate Your AI Projects

This article walks through twenty essential PyTorch concepts—from basic Tensor creation and manipulation, through autograd and neural‑network construction, to data loading, GPU acceleration, model saving, and practical training tricks—providing concrete code examples and clear explanations for developers eager to build and deploy AI models.

AutogradDataLoaderDeep Learning
0 likes · 16 min read
20 Core PyTorch Concepts to Accelerate Your AI Projects
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Nov 24, 2025 · Artificial Intelligence

How to Supercharge Transformer AI Agents with Model Compression and Inference Acceleration

This article explains why Transformer models dominate modern AI agents, outlines the challenges of large parameter counts and latency, and presents a comprehensive guide to model compression (parameter sharing, knowledge distillation, quantization, pruning) and inference acceleration (parallel computing, optimized attention, TensorRT deployment), complete with PyTorch code examples and a real‑world case study showing speed‑up and storage savings.

AI AgentInference AccelerationPyTorch
0 likes · 34 min read
How to Supercharge Transformer AI Agents with Model Compression and Inference Acceleration
Python Programming Learning Circle
Python Programming Learning Circle
Nov 18, 2025 · Artificial Intelligence

Top 10 Python Libraries Every Computer Vision Engineer Should Know

This article compiles the most commonly used Python libraries for computer vision, covering basic image handling with Pillow, high‑performance processing with OpenCV and Mahotas, advanced tools like Scikit‑Image, TensorFlow Image, PyTorch Vision, SimpleCV, Imageio, Albumentations, and the model zoo timm, each with concise descriptions and practical code snippets.

Deep LearningPyTorchTensorFlow
0 likes · 11 min read
Top 10 Python Libraries Every Computer Vision Engineer Should Know
IT Services Circle
IT Services Circle
Nov 10, 2025 · Artificial Intelligence

Why PyTorch Co‑Founder Soumith Chintala Is Leaving Meta After 11 Years

Soumith Chintala, one of PyTorch’s original creators, announced his departure from Meta after eleven years, citing a desire to move beyond the framework, reflecting on his pivotal role in building PyTorch, its global impact, and his gratitude to the community while looking ahead to new challenges.

AIDeep LearningMeta
0 likes · 12 min read
Why PyTorch Co‑Founder Soumith Chintala Is Leaving Meta After 11 Years
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 21, 2025 · Artificial Intelligence

Boost LLM Originality: Master Temperature Scaling & Top‑K Sampling

This tutorial revisits a simple text‑generation function, explains how temperature scaling and top‑K sampling reshape token probability distributions, demonstrates their effects with PyTorch code and visualizations, and shows how to integrate both techniques into an improved generation routine for more diverse and human‑like outputs.

LLMPyTorchText Generation
0 likes · 13 min read
Boost LLM Originality: Master Temperature Scaling & Top‑K Sampling
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 20, 2025 · Artificial Intelligence

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

This article revisits nanochat's core components, detailing the preparation of diverse training datasets, the scaling calculations for tokens and parameters, the model's MQA and KV‑cache design, the full training pipeline with gradient accumulation and mixed‑precision, cost breakdown, inference optimizations, evaluation tasks, and identified limitations with suggested improvements.

KV cacheLLMMQA
0 likes · 9 min read
nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation
AI Algorithm Path
AI Algorithm Path
Oct 20, 2025 · Artificial Intelligence

Building a Flow Matching Model from Scratch: Complete Code Walkthrough

This article walks through the full implementation of a flow‑matching generative model in PyTorch, covering dataset creation, a small MLP that learns a time‑dependent velocity field, the flow‑matching loss, training loop, ODE‑based sampling, visualisation of the learned vector field, and a discussion of the method's limitations and possible extensions.

Generative ModelsMLPPyTorch
0 likes · 13 min read
Building a Flow Matching Model from Scratch: Complete Code Walkthrough
AI Algorithm Path
AI Algorithm Path
Oct 13, 2025 · Artificial Intelligence

Step-by-Step Explanation of Neural ODEs with Code Examples

This article introduces Neural Ordinary Differential Equations, explains their core idea of learning continuous dynamics via a neural derivative function, demonstrates Euler integration, compares naive unfolding with the adjoint method for training, provides a PyTorch implementation, and offers practical tips and extensions such as event handling and physics‑informed models.

Adjoint methodContinuous-time modelingEuler method
0 likes · 11 min read
Step-by-Step Explanation of Neural ODEs with Code Examples
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Deep LearningNeural NetworksPositional Encoding
0 likes · 22 min read
Unveiling Transformer Internals: From Theory to PyTorch Code
Alimama Tech
Alimama Tech
Oct 1, 2025 · Artificial Intelligence

How RecIS Revolutionizes Large‑Scale Sparse‑Dense Recommendation Training

RecIS is an open‑source, PyTorch‑based unified framework designed for ultra‑large‑scale sparse‑dense computation in recommendation systems, offering a full solution for training models with massive samples, multimodal inputs, and large embeddings, and demonstrating significant performance gains over TensorFlow and TorchRec in production deployments.

PyTorchRecommendation Systemsdeep learning framework
0 likes · 24 min read
How RecIS Revolutionizes Large‑Scale Sparse‑Dense Recommendation Training
Data Party THU
Data Party THU
Sep 25, 2025 · Artificial Intelligence

Mastering Triplet Loss in Sentence‑Transformers: A Step‑by‑Step Guide

This article explains the concept of triplet loss, its mathematical formulation, the different batch‑wise implementations in the sentence_transformers library, their advantages and drawbacks, and provides a complete Python example for training a text‑embedding model with Triplet Loss.

EmbeddingPyTorchPython
0 likes · 12 min read
Mastering Triplet Loss in Sentence‑Transformers: A Step‑by‑Step Guide
IT Services Circle
IT Services Circle
Sep 16, 2025 · Artificial Intelligence

Why TensorFlow Is Dying and What the New AI Open‑Source Landscape Looks Like

An in‑depth analysis reveals TensorFlow’s rapid decline, the rise of PyTorch, and how Ant Group’s OpenRank‑driven “Large Model Open‑Source Ecosystem Panorama 2.0” maps shifting trends, from short‑term hype projects to performance‑focused AI infrastructure, highlighting the emerging US‑China dominance in AI open‑source development.

AI ecosystemAI open-sourceModel Serving
0 likes · 15 min read
Why TensorFlow Is Dying and What the New AI Open‑Source Landscape Looks Like
AI Algorithm Path
AI Algorithm Path
Aug 23, 2025 · Artificial Intelligence

Understanding QAT: Quantization‑Aware Training with PyTorch

This article explains the principles of model quantization, compares post‑training quantization (PTQ) and quantization‑aware training (QAT), details the QAT workflow in PyTorch—including fake quantization, gradient handling, and code examples—and offers practical tips for achieving high‑accuracy int8/int4 models.

Fake QuantizationPyTorchQAT
0 likes · 15 min read
Understanding QAT: Quantization‑Aware Training with PyTorch
AI Algorithm Path
AI Algorithm Path
Jul 15, 2025 · Artificial Intelligence

Day 8: Fine‑Tuning CLIP for Image‑Text Tasks – A Beginner’s Guide

This tutorial walks through fine‑tuning OpenAI's CLIP ViT‑B/32 on a small image‑text dataset in a Kaggle notebook, covering environment setup, model loading, data preprocessing with CLIPProcessor, training a linear head, and observing loss convergence to align visual and textual embeddings.

CLIPFine-tuningKaggle
0 likes · 5 min read
Day 8: Fine‑Tuning CLIP for Image‑Text Tasks – A Beginner’s Guide
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jul 13, 2025 · Artificial Intelligence

Getting Started with Hugging Face Transformers Trainer

This guide walks through the Hugging Face Transformers Trainer library, explaining its core features such as configurable training loops, mixed‑precision and gradient‑accumulation support, seamless distributed training via Accelerate and DeepSpeed, and provides a step‑by‑step example of converting a simple PyTorch CNN model to use Trainer.

AccelerateDeepSpeedDistributed Training
0 likes · 7 min read
Getting Started with Hugging Face Transformers Trainer
IT Services Circle
IT Services Circle
Jul 6, 2025 · Artificial Intelligence

Why Transformers Train Like Any Neural Network: Backpropagation Explained

This article demystifies how Transformers are trained by showing that all their linear layers have learnable weights and biases, and that the attention mechanism—including softmax and dot‑product operations—is fully differentiable and updated via standard back‑propagation.

BackpropagationDeep LearningPyTorch
0 likes · 7 min read
Why Transformers Train Like Any Neural Network: Backpropagation Explained
AI Algorithm Path
AI Algorithm Path
Jul 5, 2025 · Artificial Intelligence

Beginner’s Guide to Vision‑Language Models Day 7: How CLIP Achieves Joint Visual‑Language Understanding

This article explains CLIP’s dual‑encoder architecture—using a Vision Transformer for images and a Transformer for text—how both encoders map inputs into a shared embedding space, the role of cosine similarity, and the InfoNCE contrastive loss that drives joint visual‑language learning.

CLIPInfoNCEMulti-modal Embedding
0 likes · 8 min read
Beginner’s Guide to Vision‑Language Models Day 7: How CLIP Achieves Joint Visual‑Language Understanding
MaGe Linux Operations
MaGe Linux Operations
Jun 15, 2025 · Artificial Intelligence

Mastering Transformers: Key Extensions and Optimization Techniques Explained

This comprehensive guide walks you through the Transformer architecture—from its encoder‑decoder structure and self‑attention mechanism to multi‑head attention, positional embeddings, and practical PyTorch implementations—providing clear visualizations and code examples for deep learning practitioners.

Deep LearningPyTorchSelf-Attention
0 likes · 22 min read
Mastering Transformers: Key Extensions and Optimization Techniques Explained
Alibaba Cloud Developer
Alibaba Cloud Developer
May 29, 2025 · Artificial Intelligence

Build a Minimal Large Language Model from Scratch with Python and PyTorch

This tutorial walks through creating a simple bigram language model in pure Python, refactoring it into a PyTorch implementation, and explains core concepts such as tokenization, embedding layers, loss functions, gradient descent, training loops, and text generation, preparing you for building a full GPT model.

BigramLLMLanguageModel
0 likes · 31 min read
Build a Minimal Large Language Model from Scratch with Python and PyTorch
php Courses
php Courses
May 15, 2025 · Artificial Intelligence

Why Python Dominates Data Analysis and Machine Learning: Core Tools, Full‑Stack Solutions, and Learning Path

This article explains why Python has become the leading language for data analysis and machine learning, outlines the essential libraries and frameworks, provides practical code examples, describes typical application scenarios, suggests a staged learning roadmap, and forecasts future trends such as AutoML and federated learning.

AutoMLPyTorchPython
0 likes · 6 min read
Why Python Dominates Data Analysis and Machine Learning: Core Tools, Full‑Stack Solutions, and Learning Path
AI Algorithm Path
AI Algorithm Path
May 11, 2025 · Artificial Intelligence

How to Parallelize Ultra‑Large Model Training with PyTorch

The article explains the core concepts and trade‑offs of five parallelism techniques—data, tensor, context, pipeline, and expert parallelism—plus the ZeRO optimizer, showing when each method is appropriate for training ultra‑large PyTorch models and providing concrete code snippets and performance considerations.

Context ParallelismData ParallelismExpert Parallelism
0 likes · 21 min read
How to Parallelize Ultra‑Large Model Training with PyTorch
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Apr 23, 2025 · Artificial Intelligence

DeepQueueNet in Practice: Quickly Achieve High‑Precision Network Simulation

This article walks through using DeepQueueNet—a deep‑learning‑enhanced network performance estimator—to set up a device model, train the PyTorch version, configure a fattree16 topology, and run multi‑GPU simulations that deliver minute‑level, packet‑accurate results in as little as 1 minute 27 seconds.

Deep LearningDeepQueueNetPyTorch
0 likes · 6 min read
DeepQueueNet in Practice: Quickly Achieve High‑Precision Network Simulation
Tencent Technical Engineering
Tencent Technical Engineering
Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

Positional EncodingPyTorchSelf-Attention
0 likes · 26 min read
Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 7, 2025 · Artificial Intelligence

Why Does GPU Memory Keep Growing in DeepSeek‑R1 Inference? Uncovering PyTorch’s Cache

After deploying the full‑precision DeepSeek‑R1 model on a 2×8‑GPU ACS cluster, repeated stress tests showed GPU memory usage continuously rising without release; this article details the investigation, reproduces the behavior, examines vLLM logs, Prometheus metrics, and reveals PyTorch’s caching allocator as the root cause, offering mitigation tips.

DeepSeekGPU MemoryMemory Cache
0 likes · 21 min read
Why Does GPU Memory Keep Growing in DeepSeek‑R1 Inference? Uncovering PyTorch’s Cache
Sohu Tech Products
Sohu Tech Products
Mar 26, 2025 · Artificial Intelligence

How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding

SpatialLM is a large language model designed for 3D spatial understanding that converts point‑cloud data from videos, RGB‑D images or LiDAR into structured scene descriptions, and this guide explains its architecture, model versions, repository links, and step‑by‑step deployment on Ubuntu with PyTorch.

3D point cloudMultimodal AIPyTorch
0 likes · 7 min read
How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding
AI Algorithm Path
AI Algorithm Path
Mar 19, 2025 · Artificial Intelligence

Understanding Multimodal Large Language Models: Part 1

This article explains the fundamentals of multimodal large language models, covering their definition, typical applications, two main architectural approaches—unified embedding decoder and cross‑modal attention—along with detailed component breakdowns, a PyTorch implementation of image‑patch projection, and training considerations, ending with a discussion of trade‑offs between the methods.

Cross-AttentionImage EncoderLinear Projection
0 likes · 14 min read
Understanding Multimodal Large Language Models: Part 1
AI Algorithm Path
AI Algorithm Path
Mar 16, 2025 · Artificial Intelligence

Speed Up Your PyTorch Model Training: Practical Tips and Tricks

This article walks through concrete techniques to accelerate PyTorch training, covering mixed‑precision with torch.cuda.amp, profiling with torch.profiler, DataLoader tuning, torch.compile, distributed strategies like DataParallel and DDP, gradient accumulation, and advanced libraries such as Lightning, Apex, and DeepSpeed, plus model‑level optimizations and monitoring tips.

DataLoaderDistributed TrainingProfiling
0 likes · 12 min read
Speed Up Your PyTorch Model Training: Practical Tips and Tricks
AI Algorithm Path
AI Algorithm Path
Mar 16, 2025 · Artificial Intelligence

How to Train PyTorch Models Using Far Less GPU Memory

This article walks through a suite of PyTorch techniques—including automatic mixed precision, BF16, gradient checkpointing, gradient accumulation, tensor sharding, efficient data loading, in‑place ops, lightweight optimizers, memory profiling, TorchScript, and kernel fusion—that together can cut peak GPU memory usage by up to twenty‑fold while preserving model accuracy.

GPU MemoryPyTorchdata loading
0 likes · 13 min read
How to Train PyTorch Models Using Far Less GPU Memory
DataFunTalk
DataFunTalk
Mar 2, 2025 · Artificial Intelligence

Implementing GRPO from Scratch with Distributed Reinforcement Learning on Qwen2.5-1.5B-Instruct

This tutorial explains how to build a distributed reinforcement‑learning pipeline using the GRPO algorithm, covering data preparation, evaluation and reward functions, multi‑GPU DataParallel implementation, and full fine‑tuning of the Qwen2.5‑1.5B‑Instruct model with PyTorch, FlashAttention2 and Weights & Biases.

AIDistributed TrainingGRPO
0 likes · 10 min read
Implementing GRPO from Scratch with Distributed Reinforcement Learning on Qwen2.5-1.5B-Instruct
JavaEdge
JavaEdge
Feb 24, 2025 · Artificial Intelligence

Build a CIFAR‑10 Image Classifier with PyTorch – A Java Developer’s Guide

This tutorial walks Java developers through building, training, evaluating, and deploying a CIFAR‑10 image classifier using PyTorch, covering data loading, preprocessing, network definition, loss and optimizer setup, GPU acceleration, model saving, and per‑class accuracy analysis.

CIFAR-10Deep LearningGPU
0 likes · 18 min read
Build a CIFAR‑10 Image Classifier with PyTorch – A Java Developer’s Guide
JavaEdge
JavaEdge
Feb 23, 2025 · Artificial Intelligence

How Java Developers Can Build Neural Networks with PyTorch: A Step‑by‑Step Guide

This tutorial walks Java developers through the complete workflow of building, training, and evaluating a neural network in PyTorch, covering network definition, data iteration, forward and backward passes, loss calculation, and parameter updates with detailed code examples and Java‑centric analogies.

BackpropagationDeep LearningJava
0 likes · 12 min read
How Java Developers Can Build Neural Networks with PyTorch: A Step‑by‑Step Guide
AI Code to Success
AI Code to Success
Feb 19, 2025 · Artificial Intelligence

How to Build Traffic‑Sign Recognition and Sentiment Analysis with Keras – A Step‑by‑Step Guide

This article walks through practical Keras tutorials for image‑based traffic‑sign classification and text‑based sentiment analysis, covering data preparation, preprocessing, model construction, training, evaluation, deployment, and a concise comparison of Keras with TensorFlow and PyTorch.

Deep LearningImage ClassificationKeras
0 likes · 19 min read
How to Build Traffic‑Sign Recognition and Sentiment Analysis with Keras – A Step‑by‑Step Guide
Python Programming Learning Circle
Python Programming Learning Circle
Feb 18, 2025 · Artificial Intelligence

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

This article introduces PyTorch, covering installation on CPU/GPU, basic tensor operations, automatic differentiation, building and training neural networks, data loading with DataLoader, image classification on MNIST, model deployment, and useful tips for accelerating deep‑learning workflows.

Deep LearningGPUNeural Networks
0 likes · 9 min read
Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects
Ops Development & AI Practice
Ops Development & AI Practice
Feb 14, 2025 · Artificial Intelligence

Large Model Format Showdown: Hugging Face, TensorFlow, ONNX, TorchScript, GGUF

This comprehensive guide examines the leading large‑model storage formats—including Hugging Face Transformers, TensorFlow SavedModel, ONNX, TorchScript, and GGUF—detailing their file structures, serialization methods, strengths, weaknesses, and typical use‑cases, helping developers and researchers select the optimal format for their specific AI workloads.

AI deploymentGGUFModel Formats
0 likes · 21 min read
Large Model Format Showdown: Hugging Face, TensorFlow, ONNX, TorchScript, GGUF
AI Code to Success
AI Code to Success
Feb 14, 2025 · Artificial Intelligence

TensorFlow vs PyTorch: Which Deep Learning Framework Wins for Your Projects?

An in‑depth comparison of TensorFlow and PyTorch examines their computation graph models, deployment tools, API ergonomics, community ecosystems, and performance characteristics, helping developers decide which framework best fits industrial production or fast‑paced research scenarios.

AI DevelopmentDeep LearningPyTorch
0 likes · 8 min read
TensorFlow vs PyTorch: Which Deep Learning Framework Wins for Your Projects?
Architect
Architect
Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind
0 likes · 11 min read
How to Build a Mini ChatGPT on a Single GPU with MiniMind
AI Code to Success
AI Code to Success
Feb 13, 2025 · Artificial Intelligence

Why PyTorch Is the Go-To Framework for Modern AI Development

This article introduces PyTorch, explains its dynamic computation graph, Python‑centric design, and tensor operations, surveys its major applications in computer vision, natural language processing, and reinforcement learning, and provides a step‑by‑step tutorial for building and training a multilayer perceptron on the MNIST dataset.

Deep LearningDynamic Computation GraphMNIST
0 likes · 11 min read
Why PyTorch Is the Go-To Framework for Modern AI Development
Python Programming Learning Circle
Python Programming Learning Circle
Jan 3, 2025 · Artificial Intelligence

Visualizing Convolutional Neural Network Features with 40 Lines of Python Code

This article demonstrates how to visualize convolutional features of a VGG‑16 network using only about 40 lines of Python code, explains the underlying concepts, walks through generating patterns by maximizing filter activations, and provides a complete implementation with hooks, loss functions, and multi‑scale optimization.

CNNDeep LearningFeature Visualization
0 likes · 15 min read
Visualizing Convolutional Neural Network Features with 40 Lines of Python Code
Python Programming Learning Circle
Python Programming Learning Circle
Dec 19, 2024 · Artificial Intelligence

DeepPurpose: An AI Toolkit for Accelerating COVID‑19 Drug Discovery

DeepPurpose, a PyTorch‑based AI toolkit developed by Harvard researchers, provides COVID‑19 bioassay data and 56 cutting‑edge models that enable rapid drug‑target affinity prediction, virtual screening, and drug repurposing with just a few lines of code, dramatically shortening new‑drug development cycles.

AICOVID-19DeepPurpose
0 likes · 7 min read
DeepPurpose: An AI Toolkit for Accelerating COVID‑19 Drug Discovery
Python Programming Learning Circle
Python Programming Learning Circle
Dec 19, 2024 · Artificial Intelligence

Overview of Microsoft’s Open‑Source Computer Vision Recipes Library

The article introduces Microsoft’s open‑source Computer Vision Recipes library, describing its purpose, target audience, repository links, supported vision scenarios such as image classification, similarity, detection, key‑point, segmentation, action recognition, multi‑object tracking and crowd counting, and provides guidance on using PyTorch, Azure and GPU resources.

AzureImage ClassificationPyTorch
0 likes · 7 min read
Overview of Microsoft’s Open‑Source Computer Vision Recipes Library
Cognitive Technology Team
Cognitive Technology Team
Nov 20, 2024 · Artificial Intelligence

Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples

This article provides a comprehensive overview of neural network fundamentals, loss functions, activation functions, embedding techniques, attention mechanisms, multi‑head attention, residual networks, and the full Transformer encoder‑decoder architecture, illustrated with detailed PyTorch code and a practical MiniRBT fine‑tuning case for Chinese text classification.

AIPyTorchTransformer
0 likes · 49 min read
Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples
DaTaobao Tech
DaTaobao Tech
Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AIAttention MechanismDeep Learning
0 likes · 44 min read
Understanding Neural Networks and Transformers: Principles, Implementation, and Applications
Zhuanzhuan Tech
Zhuanzhuan Tech
Oct 16, 2024 · Artificial Intelligence

Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment

This article details the engineering practice of optimizing TorchServe‑based AI inference services, covering background challenges, framework selection, GPU‑accelerated Torch‑TRT integration, CPU‑side preprocessing improvements, and deployment on Kubernetes to achieve higher throughput and lower resource consumption.

GPUOptimizationKubernetesModelServing
0 likes · 17 min read
Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment
DataFunSummit
DataFunSummit
Oct 5, 2024 · Artificial Intelligence

Optimizing TorchRec for Large‑Scale Recommendation Systems on PyTorch

This article details the performance‑focused optimizations applied to TorchRec, PyTorch's large‑scale recommendation system library, including CUDA graph capture, multithreaded kernel launches, pinned memory copies, and input‑distribution refinements that together achieve a 2.25× speedup on MLPerf DLRM‑DCNv2 across 16 DGX H100 nodes.

CUDA GraphDistributed TrainingGPU Optimization
0 likes · 11 min read
Optimizing TorchRec for Large‑Scale Recommendation Systems on PyTorch
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Sep 18, 2024 · Artificial Intelligence

How Distributed Training Powers Massive Language Models: Concepts, Strategies, and Code

This article explains why single‑machine resources are insufficient for training ever‑larger language models, introduces the fundamentals of distributed training systems, details various parallel strategies such as data, model, pipeline, and hybrid parallelism, and provides practical PyTorch code and memory‑optimization techniques to accelerate large‑scale model training.

Deep LearningGPUParallelism
0 likes · 29 min read
How Distributed Training Powers Massive Language Models: Concepts, Strategies, and Code
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 18, 2024 · Artificial Intelligence

Why Training on 1,000 GPUs Is Harder Than You Think—and How to Tame It

Training deep learning models on a thousand GPUs faces steep communication overhead, higher failure probability, and scaling inefficiencies, but by profiling each step, overlapping compute and communication, using gradient bucketing and accumulation, and employing elastic training techniques, practitioners can approach near‑linear performance while mitigating common pitfalls.

Distributed TrainingGPU scalingPerformance Optimization
0 likes · 13 min read
Why Training on 1,000 GPUs Is Harder Than You Think—and How to Tame It
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 22, 2024 · Artificial Intelligence

Understanding Faster R-CNN: Architecture, Training, and Experimental Results

This article provides an in‑depth overview of the Faster R‑CNN object detection framework, covering its background, key innovations such as the Region Proposal Network, detailed algorithmic principles, training procedures, experimental results on PASCAL VOC and MS COCO, and a reproducible PyTorch implementation.

Computer VisionDeep LearningFaster R-CNN
0 likes · 14 min read
Understanding Faster R-CNN: Architecture, Training, and Experimental Results
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 30, 2024 · Artificial Intelligence

Spatial Attention Mechanism and Its PyTorch Implementation

This article explains the principle of spatial attention in convolutional neural networks, details the underlying algorithmic steps, and provides a complete PyTorch implementation including the attention module, full network architecture, and practical considerations for integrating spatial attention into deep learning models.

CNNDeep LearningNeural Network
0 likes · 10 min read
Spatial Attention Mechanism and Its PyTorch Implementation
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 16, 2024 · Artificial Intelligence

HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline

This article provides a detailed, English-language walkthrough of the HRNet source code, covering how the COCO keypoint dataset is built, the online data‑augmentation techniques applied during training, and the end‑to‑end training and inference procedures for human pose estimation.

Computer VisionDeep LearningHRNet
0 likes · 36 min read
HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline
Practical DevOps Architecture
Practical DevOps Architecture
May 30, 2024 · Artificial Intelligence

Eight‑Week LLM and Large Model Training Course Outline

This article outlines an eight‑week curriculum covering LLM evolution, PyTorch fundamentals, CUDA training, large‑model fine‑tuning, LangChain application development, cloud‑based quantization, industry case studies, and a recruitment session, providing video resources for each topic.

AIFine-tuningLLM
0 likes · 5 min read
Eight‑Week LLM and Large Model Training Course Outline
Python Programming Learning Circle
Python Programming Learning Circle
May 11, 2024 · Artificial Intelligence

A Comprehensive Overview of Popular Python Libraries for Artificial Intelligence and Data Science

This article introduces and demonstrates more than twenty widely used Python libraries for artificial intelligence, computer vision, natural language processing, and data analysis, providing concise explanations and runnable code snippets that illustrate each library's core functionality and typical use cases.

Data ScienceNumPyPyTorch
0 likes · 29 min read
A Comprehensive Overview of Popular Python Libraries for Artificial Intelligence and Data Science
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Mar 29, 2024 · Artificial Intelligence

InternLM Model Research and XTuner Practical Guide (Part 1): DataLoader, Model Conversion, Merging, and Inference

The guide walks through fine‑tuning InternLM‑Chat‑7B with XTuner, showing how to build a DataLoader from a HuggingFace Dataset, convert a LoRA .pth checkpoint to HuggingFace format, merge the adapter into the base model, run inference, and adapt the process for custom datasets and 4‑bit quantization experiments.

DataLoaderFineTuningInternLM
0 likes · 27 min read
InternLM Model Research and XTuner Practical Guide (Part 1): DataLoader, Model Conversion, Merging, and Inference
Test Development Learning Exchange
Test Development Learning Exchange
Mar 27, 2024 · Artificial Intelligence

Introduction to PyTorch and Example CNN Training on CIFAR-10

This article introduces PyTorch as a leading open‑source deep‑learning framework, outlines its key components such as dynamic computation graphs, tensors, autograd, modules, optimizers, data loading, distributed training and TorchScript, and provides a complete Python example that defines a simple CNN and trains it on the CIFAR‑10 dataset.

CNNDeep LearningPyTorch
0 likes · 8 min read
Introduction to PyTorch and Example CNN Training on CIFAR-10
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 28, 2024 · Artificial Intelligence

How PAI‑TorchAcc Supercharges OLMo LLM Training with Up to 1.64× Speedup

PAI‑TorchAcc, Alibaba Cloud’s PyTorch accelerator, integrates the open‑source OLMo large language model and delivers up to 1.64× faster training on OLMo‑1B and 1.52× on OLMo‑7B by leveraging graph capture, distributed, compute, communication, and memory optimizations, with detailed usage steps and performance analysis.

LLM trainingOLMoPAI‑TorchAcc
0 likes · 7 min read
How PAI‑TorchAcc Supercharges OLMo LLM Training with Up to 1.64× Speedup
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 23, 2024 · Artificial Intelligence

How PAI‑TorchAcc Supercharges Large‑Model Training on Alibaba Cloud

PAI‑TorchAcc, an Alibaba Cloud AI platform accelerator, offers a seamless PyTorch interface that integrates HuggingFace models and employs LazyTensor‑based static graph conversion, multi‑strategy distributed training, and extensive GPU optimizations to dramatically boost throughput for 1B‑175B parameter models, surpassing PyTorch native and Megatron‑LM performance.

AI accelerationAlibaba CloudGPU Optimization
0 likes · 13 min read
How PAI‑TorchAcc Supercharges Large‑Model Training on Alibaba Cloud
DataFunTalk
DataFunTalk
Feb 13, 2024 · Artificial Intelligence

An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training

This article introduces NVIDIA’s open‑source NeMo framework, detailing its PyTorch‑based architecture for Speech AI, ASR and TTS training, NLP and LLM support, GPU‑optimized parallelism, pre‑trained model resources, fine‑tuning techniques, and the accompanying NeMo Aligner and Framework tools.

ASRNVIDIA NeMoPyTorch
0 likes · 18 min read
An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training
Baidu Geek Talk
Baidu Geek Talk
Feb 5, 2024 · Artificial Intelligence

Why Static Graphs Outperform Dynamic Graphs in AutoDiff: A Deep Dive

This article explains the fundamental differences between static and dynamic computation graphs, compares their memory and performance characteristics, shows how automatic differentiation works in each paradigm, and provides a step‑by‑step implementation of a toy static‑graph AutoDiff engine with Python code examples.

AutoDiffDeep LearningDynamic Graph
0 likes · 18 min read
Why Static Graphs Outperform Dynamic Graphs in AutoDiff: A Deep Dive
Open Source Tech Hub
Open Source Tech Hub
Jan 20, 2024 · Artificial Intelligence

How to Set Up ModelScope with Anaconda and Run OCR Inference via PHP

This guide walks through installing Anaconda, creating a Python 3.10 conda environment, adding PyTorch and ModelScope libraries, installing domain-specific dependencies, verifying NLP pipelines, and using PHPY to call ModelScope's OCR model from PHP, complete with code snippets and troubleshooting tips.

AI inferenceAnacondaModelScope
0 likes · 10 min read
How to Set Up ModelScope with Anaconda and Run OCR Inference via PHP
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 14, 2024 · Artificial Intelligence

Understanding and Implementing LoRA (Low‑Rank Adaptation) for Model Training with PyTorch

This article explains the principle of LoRA (Low‑Rank Adaptation) for large language models, demonstrates how to decompose weight updates into low‑rank matrices, and provides a complete PyTorch implementation that fine‑tunes a small VGG‑19 network on a custom goldfish dataset.

Deep LearningLoRANeural Networks
0 likes · 11 min read
Understanding and Implementing LoRA (Low‑Rank Adaptation) for Model Training with PyTorch
AntTech
AntTech
Jan 9, 2024 · Artificial Intelligence

ATorch: Ant Group’s Open‑Source Distributed Training Acceleration Library for Large‑Scale AI Models

Ant Group’s newly open‑sourced ATorch library extends PyTorch with a layered architecture and automated resource‑aware strategies, boosting large‑model training efficiency up to 60% utilization, enhancing stability, and delivering significant throughput gains across multi‑node, multi‑GPU deployments.

AI accelerationDistributed TrainingPyTorch
0 likes · 6 min read
ATorch: Ant Group’s Open‑Source Distributed Training Acceleration Library for Large‑Scale AI Models
Sohu Tech Products
Sohu Tech Products
Dec 27, 2023 · Artificial Intelligence

Analysis of LLaMA Model Architecture in the Transformers Library

This article walks through the core LLaMA implementation in HuggingFace’s Transformers library, detailing the inheritance hierarchy, configuration defaults, model initialization, embedding and stacked decoder layers, the RMSNorm‑based attention and MLP modules, and the forward pass that produces normalized hidden states.

Deep LearningModel architecturePyTorch
0 likes · 14 min read
Analysis of LLaMA Model Architecture in the Transformers Library
DataFunTalk
DataFunTalk
Dec 10, 2023 · Artificial Intelligence

PyTorch Model Training Performance Tuning Guide

This guide provides comprehensive techniques for optimizing PyTorch training performance and efficiency, covering all model types such as CNNs, RNNs, GANs, and transformers, and applicable across domains like computer vision and natural language processing, targeting AI/ML platform engineers, data engineers, backend developers, MLOps, SREs, architects, and machine learning engineers.

AIDeep LearningPyTorch
0 likes · 2 min read
PyTorch Model Training Performance Tuning Guide
DataFunSummit
DataFunSummit
Nov 18, 2023 · Artificial Intelligence

PyTorch Model Training Performance Tuning Guide with Alluxio

This guide explains how Ant Group uses Alluxio to overcome storage I/O, capacity, and latency challenges, delivering stability, performance, and scalability improvements for large‑scale PyTorch model training while reducing infrastructure costs and providing practical optimization techniques and code examples.

AIAlluxioPyTorch
0 likes · 4 min read
PyTorch Model Training Performance Tuning Guide with Alluxio
DataFunSummit
DataFunSummit
Nov 13, 2023 · Artificial Intelligence

SWIFT: A Scalable Light‑Weight Training and Inference Framework for Efficient Model Fine‑Tuning

SWIFT is an open‑source, PyTorch‑based framework that integrates multiple efficient fine‑tuning methods such as LoRA, QLoRA, Adapter, and the proprietary ResTuning, enabling developers to fine‑tune large language and multimodal models on consumer‑grade GPUs with significantly reduced memory and compute requirements.

Fine-tuningLoRAModelScope
0 likes · 13 min read
SWIFT: A Scalable Light‑Weight Training and Inference Framework for Efficient Model Fine‑Tuning
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 19, 2023 · Artificial Intelligence

NLP Basics: Word Embeddings, Word2Vec, and Hand‑crafted RNN Implementation in PyTorch

This article introduces word‑level representations—from one‑hot encoding to dense word embeddings via Word2Vec—explains cosine similarity, then walks through the structure, limitations, and PyTorch implementation of a vanilla RNN, including a custom forward function and verification against the library API.

Cosine SimilarityNLPPyTorch
0 likes · 19 min read
NLP Basics: Word Embeddings, Word2Vec, and Hand‑crafted RNN Implementation in PyTorch
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 15, 2023 · Artificial Intelligence

Run a 70B FP16 Model on a Single 16 GB GPU with PyTorch Meta Device

This article explains how to overcome GPU memory limits by using PyTorch 1.9's meta device to create an empty model, load large‑scale model weights layer‑by‑layer, move each part to a 16 GB GPU for inference, and release memory, enabling a 70B FP16 model to run on a single consumer‑grade GPU.

GPU memory optimizationPyTorchmeta device
0 likes · 12 min read
Run a 70B FP16 Model on a Single 16 GB GPU with PyTorch Meta Device
21CTO
21CTO
Oct 8, 2023 · Artificial Intelligence

Why Hugging Face’s New Rust‑Based Candle Framework Could Redefine AI Inference

Hugging Face has released Candle, a Rust‑written machine‑learning framework aimed at serverless inference, offering lightweight binaries, GPU support, and performance gains over Python‑based PyTorch, while sparking debate over Rust’s learning curve and the future of AI deployment.

AI FrameworkCandlePyTorch
0 likes · 7 min read
Why Hugging Face’s New Rust‑Based Candle Framework Could Redefine AI Inference
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 4, 2023 · Artificial Intelligence

Hands‑On Building a Transformer from Scratch with PyTorch

This tutorial walks you through implementing a full Transformer model in PyTorch, starting from basic linear‑regression code, adding attention mechanisms, multi‑head attention, encoder‑decoder architecture, training loops, and inference, all reinforced with practical debugging tips.

Deep LearningNLPPyTorch
0 likes · 17 min read
Hands‑On Building a Transformer from Scratch with PyTorch