Tagged articles
176 articles
Page 1 of 2
Machine Heart
Machine Heart
Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

Deep LearningNeural Networkshyperparameters
0 likes · 14 min read
Has Deep Learning Discovered Its Own “Newton’s Law”?
SuanNi
SuanNi
Apr 10, 2026 · Artificial Intelligence

Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer

A groundbreaking study shows how Meta AI and KAUST transformed a video‑generation model into a neural‑computer that unifies computation, storage, and I/O, enabling pixel‑perfect command‑line and graphical UI control while highlighting current limitations in arithmetic reasoning and long‑term program stability.

AI video generationHuman‑computer interactionNeural Networks
0 likes · 9 min read
Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 20, 2026 · Artificial Intelligence

Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals

This article explains how Attention Residuals (AttnRes) replace traditional residual shortcuts with layer‑wise attention, details the mathematical reformulation, design constraints, static‑Q trick, full and block variants, and presents experimental evidence of significant accuracy gains with modest overhead.

NLPNeural NetworksRMSNorm
0 likes · 11 min read
Why Kimi Dropped Residual Connections: A First‑Person Deep Dive into Attention Residuals
DeepHub IMBA
DeepHub IMBA
Feb 28, 2026 · Artificial Intelligence

Why Energy‑Based Models Could Outperform Probabilistic LLMs, According to Yann LeCun

Yann LeCun argues that the probability‑driven, token‑by‑token design of current large language models may never reach human‑level intelligence, and explains how Energy‑Based Models replace probability distributions with an energy function, offering more flexible training, inference, and multi‑modal capabilities.

Contrastive DivergenceDensity EstimationEBM
0 likes · 23 min read
Why Energy‑Based Models Could Outperform Probabilistic LLMs, According to Yann LeCun
Data Party THU
Data Party THU
Feb 28, 2026 · Artificial Intelligence

How MIT’s Attention Matching Turns Linear Regression into Fast KV Compression

The article explains MIT’s Attention Matching technique that reformulates large‑model context compression as a linear regression problem, detailing its theoretical foundations, three‑step gradient‑free implementation, architectural adaptations, non‑uniform budgeting, and extensive evaluations showing orders‑of‑magnitude speed gains with minimal accuracy loss.

Attention MatchingKV compressionMemory Optimization
0 likes · 10 min read
How MIT’s Attention Matching Turns Linear Regression into Fast KV Compression
Data Party THU
Data Party THU
Feb 21, 2026 · Artificial Intelligence

Unlocking Compositional Generalization: Meta‑Learning Strategies for Neural Networks

This article examines how meta‑learning combined with compositionality enables neural networks to rapidly adapt to new tasks by formalizing hierarchical optimization, leveraging modular architectures with hypernetworks, and exploiting Transformer latent codes for effective compositional generalization.

Bilevel OptimizationMeta LearningNeural Networks
0 likes · 5 min read
Unlocking Compositional Generalization: Meta‑Learning Strategies for Neural Networks
Tencent Technical Engineering
Tencent Technical Engineering
Feb 2, 2026 · Artificial Intelligence

Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models

This comprehensive guide walks through the fundamentals of neural networks, activation functions, training methods, and how they power large language models, while also covering tokenization, self‑attention, transformer architectures, AI infrastructure, and practical usage through agents and retrieval‑augmented generation.

Agent SystemsDeep LearningGPU infrastructure
0 likes · 75 min read
Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models
AI Architecture Hub
AI Architecture Hub
Jan 7, 2026 · Artificial Intelligence

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

This article provides a comprehensive, beginner‑friendly walkthrough of the landmark 2017 paper “Attention Is All You Need,” covering its authors, historical context, the shortcomings of RNNs and CNNs, the birth of self‑attention, the Transformer architecture, and its transformative impact on modern AI.

AI historyAttention MechanismDeep Learning
0 likes · 9 min read
Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 5, 2025 · Artificial Intelligence

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

TinyAI is a completely Java‑implemented, lightweight full‑stack AI framework that demonstrates how to build a production‑grade deep‑learning system—from low‑level numeric tensors and automatic differentiation to modular neural‑network layers, training pipelines, large‑language‑model implementations, and intelligent agent architectures—while remaining education‑friendly and free of external dependencies.

AI FrameworkAgent SystemCode Examples
0 likes · 33 min read
How TinyAI Brings a Full‑Stack AI Framework to Pure Java
Tencent Cloud Developer
Tencent Cloud Developer
Nov 4, 2025 · Artificial Intelligence

From Functions to Transformers: Mastering Neural Networks Step by Step

This article walks you through the evolution from basic mathematical functions to modern large‑scale models, explaining activation functions, forward and backward propagation, loss calculation, gradient descent, regularization, dropout, word embeddings, RNNs, and the core mechanics of the Transformer architecture.

Attention MechanismDeep LearningNeural Networks
0 likes · 15 min read
From Functions to Transformers: Mastering Neural Networks Step by Step
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 21, 2025 · Artificial Intelligence

KANMixer: A New KAN‑Centric Paradigm for Long‑Term Time Series Forecasting

This article reviews the KANMixer model, which places Kolmogorov‑Arnold Networks at the core of a lightweight architecture for long‑term time series forecasting, detailing its design, extensive benchmark experiments on seven real‑world datasets, ablation analyses, and its computational trade‑offs versus MLP and Transformer baselines.

Ablation StudyKANLong-term Time Series Forecasting
0 likes · 8 min read
KANMixer: A New KAN‑Centric Paradigm for Long‑Term Time Series Forecasting
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Deep LearningNeural NetworksPositional Encoding
0 likes · 22 min read
Unveiling Transformer Internals: From Theory to PyTorch Code
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2025 · Artificial Intelligence

Why the Last Decade Became the Golden Age of AI Chip Architecture

The article traces the evolution of AI hardware over the past ten years, outlining three key phases—from early chip limitations that sidelined neural networks, through CPU advances that still fell short, to the rise of GPUs and specialized AI chips that finally unlocked rapid AI deployment, while also highlighting the parallel impact of algorithmic breakthroughs and massive data growth.

AI hardwareBig DataGPU
0 likes · 5 min read
Why the Last Decade Became the Golden Age of AI Chip Architecture
Qborfy AI
Qborfy AI
Aug 7, 2025 · Artificial Intelligence

Understanding RNNs: From Memory Cells to Real‑World Applications

This article explains how recurrent neural networks (RNNs) add memory to neural models, details the gate mechanisms of LSTM and GRU, compares their structures and parameter counts, and illustrates their use in speech recognition, translation, stock prediction, and video generation, while highlighting practical insights and energy considerations.

AIDeep LearningGRU
0 likes · 5 min read
Understanding RNNs: From Memory Cells to Real‑World Applications
Didi Tech
Didi Tech
Jul 31, 2025 · Artificial Intelligence

How to Build Efficient Causal Effect Estimators for Exponential‑Family Outcomes

This article presents a unified framework for efficiently estimating causal treatment effects on exponential‑family outcomes, extending target regularization beyond Gaussian assumptions, deriving bias analysis for plug‑in estimators, proposing DR and TMLE‑based estimators, and validating them on synthetic and real datasets.

Neural Networkscausal inferenceexponential family
0 likes · 12 min read
How to Build Efficient Causal Effect Estimators for Exponential‑Family Outcomes
Qborfy AI
Qborfy AI
Jul 2, 2025 · Artificial Intelligence

Mastering Activation Functions: From Sigmoid to Swish and When to Use Them

This article explains the role of activation functions in neural networks, compares five classic functions with formulas, performance trade‑offs, and gradient behavior, and provides a Python visualization demo plus several practical insights and real‑world examples.

Deep LearningNeural NetworksReLU
0 likes · 7 min read
Mastering Activation Functions: From Sigmoid to Swish and When to Use Them
AI Large Model Application Practice
AI Large Model Application Practice
May 16, 2025 · Artificial Intelligence

Why Residual Connections Keep Deep Neural Networks Stable

This article explains why residual connections are essential in deep neural networks, describing the problems of network degradation and gradient vanishing, how shortcut paths add the input to the layer output, the requirement of matching dimensions, and the resulting stability for training large language models.

LLMNeural NetworksResidual Connections
0 likes · 7 min read
Why Residual Connections Keep Deep Neural Networks Stable
AI Cyberspace
AI Cyberspace
May 3, 2025 · Artificial Intelligence

How Hopfield Networks Mimic Brain Memory: Theory, Math, and Python Demo

This article explores the 1982 Hopfield associative memory neural network, detailing its biological inspiration, energy‑minimization principle, mathematical formulation, training and recall processes, capacity limits, practical Python implementation, and the model's strengths and weaknesses.

Hopfield networkNeural NetworksPython implementation
0 likes · 21 min read
How Hopfield Networks Mimic Brain Memory: Theory, Math, and Python Demo
IT Services Circle
IT Services Circle
May 2, 2025 · Artificial Intelligence

Understanding Gradient Vanishing in Deep Neural Networks and How to Mitigate It

The article explains why deep networks suffer from gradient vanishing—especially when using sigmoid or tanh activations—covers the underlying mathematics, compares activation functions, and presents practical techniques such as proper weight initialization, batch normalization, residual connections, and code examples to visualize the phenomenon.

Batch NormalizationDeep LearningNeural Networks
0 likes · 7 min read
Understanding Gradient Vanishing in Deep Neural Networks and How to Mitigate It
AI Frontier Lectures
AI Frontier Lectures
Apr 30, 2025 · Artificial Intelligence

How Dual‑Domain Strip Attention Revolutionizes Image Restoration

The paper introduces Dual‑Domain Strip Attention Network (DSANet), a lightweight architecture that combines spatial and frequency strip attention to boost multi‑scale representation learning, achieving state‑of‑the‑art performance on dehazing, desnowing, defocus deblurring, and denoising tasks with significantly lower computational cost.

Deep LearningNeural Networksdual-domain attention
0 likes · 10 min read
How Dual‑Domain Strip Attention Revolutionizes Image Restoration
Cognitive Technology Team
Cognitive Technology Team
Apr 12, 2025 · Artificial Intelligence

Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations

This article walks through an interactive exploration of a simple two‑hidden‑layer neural network, showing how real‑time visualizations reveal its learned representations, accuracy limits, and why constrained training leads to over‑confident yet unintelligent predictions before introducing backpropagation.

BackpropagationDeep LearningNeural Networks
0 likes · 10 min read
Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations
Cognitive Technology Team
Cognitive Technology Team
Apr 9, 2025 · Artificial Intelligence

How Neural Networks Learn: Gradient Descent and Loss Functions

This article explains how neural networks learn by using labeled training data, describing the role of weights, biases, activation functions, and how gradient descent iteratively adjusts parameters to minimize loss, illustrated with the MNIST digit‑recognition example.

Deep LearningMNISTNeural Networks
0 likes · 16 min read
How Neural Networks Learn: Gradient Descent and Loss Functions
Cognitive Technology Team
Cognitive Technology Team
Apr 8, 2025 · Artificial Intelligence

Understanding Neural Networks: Structure, Layers, and Activation

This article explains how a simple neural network can recognize handwritten digits by preprocessing images, organizing neurons into input, hidden, and output layers, using weighted sums, biases, sigmoid compression, and matrix multiplication to illustrate the fundamentals of deep learning.

Deep LearningLayersNeural Networks
0 likes · 16 min read
Understanding Neural Networks: Structure, Layers, and Activation
Python Programming Learning Circle
Python Programming Learning Circle
Feb 18, 2025 · Artificial Intelligence

Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects

This article introduces PyTorch, covering installation on CPU/GPU, basic tensor operations, automatic differentiation, building and training neural networks, data loading with DataLoader, image classification on MNIST, model deployment, and useful tips for accelerating deep‑learning workflows.

Deep LearningGPUNeural Networks
0 likes · 9 min read
Getting Started with PyTorch: Installation, Core Operations, and Practical Deep Learning Projects
AI Code to Success
AI Code to Success
Feb 13, 2025 · Artificial Intelligence

Why PyTorch Is the Go-To Framework for Modern AI Development

This article introduces PyTorch, explains its dynamic computation graph, Python‑centric design, and tensor operations, surveys its major applications in computer vision, natural language processing, and reinforcement learning, and provides a step‑by‑step tutorial for building and training a multilayer perceptron on the MNIST dataset.

Deep LearningDynamic Computation GraphMNIST
0 likes · 11 min read
Why PyTorch Is the Go-To Framework for Modern AI Development
Cognitive Technology Team
Cognitive Technology Team
Feb 12, 2025 · Artificial Intelligence

Introduction to Neural Networks by Professor Li Yongle

In this introductory session, renowned graduate exam instructor Professor Li Yongle provides a clear, beginner-friendly overview of neural networks, covering basic concepts and their relevance within artificial intelligence, including their structure, learning mechanisms, and typical applications in modern AI systems.

AIDeep LearningNeural Networks
0 likes · 1 min read
Introduction to Neural Networks by Professor Li Yongle
AI Code to Success
AI Code to Success
Feb 11, 2025 · Artificial Intelligence

Unlocking TensorFlow: From Basics to Building Your First Linear Regression Model

This article introduces TensorFlow's core concepts—tensors, computational graphs, variables, and sessions—covers its wide range of AI applications from traditional machine learning to deep learning in NLP and computer vision, and provides a step‑by‑step Python tutorial for implementing a simple linear regression model.

AI TutorialDeep LearningNeural Networks
0 likes · 6 min read
Unlocking TensorFlow: From Basics to Building Your First Linear Regression Model
Architect
Architect
Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts
0 likes · 21 min read
Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3
Cognitive Technology Team
Cognitive Technology Team
Feb 9, 2025 · Artificial Intelligence

A Beginner’s Guide to the History and Key Concepts of Deep Learning

From the perceptron’s inception in 1958 to modern Transformer-based models like GPT, this article traces the evolution of deep learning, explaining foundational architectures such as DNNs, CNNs, RNNs, LSTMs, attention mechanisms, and recent innovations like DeepSeek’s MLA, highlighting their principles and impact.

Deep LearningGPTMLA
0 likes · 19 min read
A Beginner’s Guide to the History and Key Concepts of Deep Learning
AI Cyberspace
AI Cyberspace
Jan 28, 2025 · Artificial Intelligence

From Biological Neurons to Deep Learning: How MP Models Evolve

This article explains the structure of biological neurons, introduces the McCulloch‑Pitts (MP) mathematical model, shows how manual weight adjustments work, and walks through the development from single‑layer perceptrons to two‑layer networks and modern deep learning techniques, covering activation functions, training algorithms, and practical examples.

BackpropagationDeep LearningMP model
0 likes · 30 min read
From Biological Neurons to Deep Learning: How MP Models Evolve
AI Large Model Application Practice
AI Large Model Application Practice
Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

LLMNeural Networksembeddings
0 likes · 8 min read
How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs
AI Large Model Application Practice
AI Large Model Application Practice
Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMNeural Networks
0 likes · 10 min read
Turning Classification Nets into Language Generators: A Step‑by‑Step Guide
AI Large Model Application Practice
AI Large Model Application Practice
Jan 9, 2025 · Artificial Intelligence

How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide

This article walks through the complete training cycle of a simple neural network—from random weight initialization and forward propagation with labeled data, through loss calculation and gradient‑based weight updates, to iterative epochs, average loss, and practical issues like gradient explosion and vanishing.

AIModel TrainingNeural Networks
0 likes · 11 min read
How Does Gradient Descent Train a Neural Network? A Step‑by‑Step Guide
Model Perspective
Model Perspective
Dec 26, 2024 · Fundamentals

What Makes a Mathematical Model Enduring? Lessons from AI and Ecology

The article explores the characteristics of long‑lasting mathematical models—continuous refinement, expanding applicability, elegant simplicity, extensibility, focus on essence, and philosophical depth—illustrated with examples such as neural networks and the Lotka‑Volterra predator‑prey system, and offers guidance on creating such vibrant models.

Lotka-VolterraNeural Networksinterdisciplinary
0 likes · 6 min read
What Makes a Mathematical Model Enduring? Lessons from AI and Ecology
DevOps
DevOps
Dec 5, 2024 · Artificial Intelligence

A Brief History of Artificial Intelligence: From McCulloch‑Pitts Neurons to GPT‑4

This article traces the evolution of artificial intelligence from the 1943 McCulloch‑Pitts neuron model through key milestones such as Turing's test, the Dartmouth conference, the rise of neural networks, deep learning breakthroughs, and recent large language models like GPT‑4, illustrating the field's rapid progress.

GPTNeural Networksartificial intelligence
0 likes · 7 min read
A Brief History of Artificial Intelligence: From McCulloch‑Pitts Neurons to GPT‑4
Model Perspective
Model Perspective
Dec 5, 2024 · Artificial Intelligence

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Activation functions are crucial for neural networks, providing non‑linearity, normalization, and gradient flow; this article reviews common functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Noisy ReLU, Softmax, and Swish, comparing their characteristics, advantages, drawbacks, and guidance for selecting the appropriate one.

Model OptimizationNeural Networksactivation functions
0 likes · 10 min read
Choosing the Right Activation Function: Pros, Cons, and Best Practices
DaTaobao Tech
DaTaobao Tech
Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AIAttention MechanismDeep Learning
0 likes · 44 min read
Understanding Neural Networks and Transformers: Principles, Implementation, and Applications
Model Perspective
Model Perspective
Oct 17, 2024 · Artificial Intelligence

Visualizing How Neural Networks Approximate Any Function

This article explains the universal approximation theorem, showing how even a simple neural network with one hidden layer can approximate any continuous function by adjusting weights and biases, and illustrates the process with visual examples of step and bump functions, linking theory to recent Nobel recognitions.

AINeural Networksfunction approximation
0 likes · 9 min read
Visualizing How Neural Networks Approximate Any Function
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 27, 2024 · Artificial Intelligence

How AI Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new AI‑driven method for detecting cluster‑wide task slowdowns in cloud platforms improves F1 score by 5.3% over state‑of‑the‑art techniques, addressing challenges of composite periodic patterns, training data contamination, and focusing on slowdown anomalies.

Neural NetworksTime SeriesUnsupervised Learning
0 likes · 8 min read
How AI Detects Cluster-Wide Task Slowdowns in Cloud Systems
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 26, 2024 · Cloud Computing

How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new paper accepted at ACM SIGKDD2024 presents a neural‑network‑based framework that uses a skim‑attention mechanism and a picky loss function to accurately detect cluster‑wide task slowdown anomalies in large‑scale cloud platforms, achieving a 5.3% average F1‑score improvement over state‑of‑the‑art methods.

Cluster PerformanceNeural Networksanomaly detection
0 likes · 5 min read
How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems
21CTO
21CTO
Aug 11, 2024 · Artificial Intelligence

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

This article explains the fundamentals of large language models, covering tokenization, probability prediction, Markov chain basics, training data limitations, context windows, and the transition to neural network architectures like Transformers, while providing Python examples and insights into model scaling and the illusion of intelligence.

AILLMNeural Networks
0 likes · 18 min read
Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI
Ops Development & AI Practice
Ops Development & AI Practice
Jul 6, 2024 · Artificial Intelligence

How Backpropagation Powers Modern Deep Learning: A Deep Dive

This article explains the backpropagation algorithm—its origins, mathematical basis, step‑by‑step workflow, importance for efficient neural network training, and widespread applications in image recognition, natural language processing, and recommendation systems.

BackpropagationDeep LearningNeural Networks
0 likes · 6 min read
How Backpropagation Powers Modern Deep Learning: A Deep Dive
Ops Development & AI Practice
Ops Development & AI Practice
Jul 3, 2024 · Artificial Intelligence

How Do Artificial Neural Networks Mirror Animal Brains? An In‑Depth Overview

This article explains the fundamental concepts and architecture of artificial neural networks, describes their learning process, compares them with biological neural systems, and highlights both the similarities and key differences in structure, learning mechanisms, flexibility, and energy efficiency.

Biological InspirationDeep LearningNeural Networks
0 likes · 7 min read
How Do Artificial Neural Networks Mirror Animal Brains? An In‑Depth Overview
JD Tech Talk
JD Tech Talk
Jun 25, 2024 · Artificial Intelligence

Understanding Large Language Models: From Parameters to Transformer Architecture

This article explains the fundamental concepts behind large language models, including their two-file structure, training process, neural network basics, perceptron examples, weight and threshold calculations, the TensorFlow Playground, and a detailed walkthrough of the Transformer architecture with tokenization, positional encoding, self‑attention, normalization, and feed‑forward layers.

AINeural NetworksSelf-Attention
0 likes · 20 min read
Understanding Large Language Models: From Parameters to Transformer Architecture
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 12, 2024 · Artificial Intelligence

A Simple Introduction to the Transformer Model

This article provides a comprehensive, beginner-friendly explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, decoding process, final linear and softmax layers, and training considerations, illustrated with numerous diagrams and code snippets.

Deep LearningNeural NetworksSelf-Attention
0 likes · 24 min read
A Simple Introduction to the Transformer Model
Architects Research Society
Architects Research Society
May 21, 2024 · Artificial Intelligence

27 Essential AI Papers Recommended by Ilya Sutskever for John Carmack

Ilya Sutskever, former OpenAI chief scientist, shared a curated list of 27 seminal AI research papers—including the Annotated Transformer, Attention Is All You Need, and Deep Residual Learning—with links, claiming mastering them covers roughly 90% of today’s essential artificial‑intelligence knowledge.

AIDeep LearningNeural Networks
0 likes · 7 min read
27 Essential AI Papers Recommended by Ilya Sutskever for John Carmack
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 5, 2024 · Artificial Intelligence

Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training

This article provides an in‑depth tutorial on neural network algorithms, covering their biological inspiration, significance, advantages and drawbacks, detailed architecture, data preparation, one‑hot encoding, weight initialization, forward and backward propagation, cost functions, regularization, gradient checking, and complete Python code examples.

AIBackpropagationNeural Networks
0 likes · 37 min read
Comprehensive Guide to Neural Network Algorithms: Definitions, Structure, Implementation, and Training
DaTaobao Tech
DaTaobao Tech
Apr 22, 2024 · Artificial Intelligence

Neural Networks and Deep Learning: Principles and MNIST Example

The article reviews recent generative‑AI breakthroughs such as GPT‑5 and AI software engineers, explains that AI systems are deterministic rather than black boxes, and then teaches neural‑network fundamentals—including activation functions, back‑propagation, and a hands‑on MNIST digit‑recognition example with discussion of overfitting and regularization.

Deep LearningMNISTNeural Networks
0 likes · 17 min read
Neural Networks and Deep Learning: Principles and MNIST Example
AI Algorithm Path
AI Algorithm Path
Apr 5, 2024 · Artificial Intelligence

Master CNN, RNN, GAN, and Transformer Architectures in One Guide

This article provides a friendly, step‑by‑step overview of five core deep‑learning architectures—CNN, RNN, GAN, Transformers, and encoder‑decoder—explaining their structures, key components, and typical use cases in image and natural‑language processing.

CNNDeep LearningEncoder-Decoder
0 likes · 12 min read
Master CNN, RNN, GAN, and Transformer Architectures in One Guide
Architect
Architect
Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

Attention MechanismInferenceModel Training
0 likes · 14 min read
Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Feb 28, 2024 · Artificial Intelligence

Mastering Multi-Task Learning: Network Designs & Loss Balancing

This article reviews the challenges of multi‑task learning, compares various network architectures such as hard‑parameter sharing, MMoE, CGC, and PLE, and examines loss‑balancing techniques like GradNorm, Dynamic Weight Average and task‑prioritization, offering insights on how to mitigate the “seesaw” effect and improve overall performance.

AI researchNeural Networksdynamic weighting
0 likes · 15 min read
Mastering Multi-Task Learning: Network Designs & Loss Balancing
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 14, 2024 · Artificial Intelligence

Understanding and Implementing LoRA (Low‑Rank Adaptation) for Model Training with PyTorch

This article explains the principle of LoRA (Low‑Rank Adaptation) for large language models, demonstrates how to decompose weight updates into low‑rank matrices, and provides a complete PyTorch implementation that fine‑tunes a small VGG‑19 network on a custom goldfish dataset.

Deep LearningLoRANeural Networks
0 likes · 11 min read
Understanding and Implementing LoRA (Low‑Rank Adaptation) for Model Training with PyTorch
JD Tech
JD Tech
Nov 30, 2023 · Artificial Intelligence

Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room

This article examines the principles behind ChatGPT, detailing its continuation-based operation, the role of attention mechanisms and transformer architecture, the scaling of neural networks that leads to emergent abilities, and interprets these phenomena through the lenses of compression theory and the Chinese Room thought experiment.

Attention MechanismChatGPTEmergence
0 likes · 27 min read
Understanding ChatGPT: Mechanisms, Attention, Emergence, and the Chinese Room
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 15, 2023 · Artificial Intelligence

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

This article explains the Transformer model, comparing it with RNNs, detailing its encoder‑decoder structure, multi‑head and scaled dot‑product attention, embedding layers, feed‑forward networks, and the final linear‑softmax output, supplemented with diagrams and code examples.

Deep LearningEncoder-DecoderNeural Networks
0 likes · 10 min read
Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms
JD Cloud Developers
JD Cloud Developers
Oct 10, 2023 · Artificial Intelligence

Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained

This article examines whether ChatGPT and other large language models exhibit true Theory of Mind, detailing the role of attention mechanisms, neural network architecture, emergent abilities, the Chinese‑room argument, and how compression of massive textual data underlies their apparent intelligence.

Attention MechanismEmergenceNeural Networks
0 likes · 30 min read
Do Large Language Models Have a Mind? Attention, Emergence & Compression Explained
MaGe Linux Operations
MaGe Linux Operations
Sep 25, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

Stephen Wolfram explains the inner workings of ChatGPT, covering its transformer architecture, probability‑based word selection, training on massive text corpora, the role of embeddings, neural network layers, attention mechanisms, and the challenges of modeling language, offering a deep technical overview for AI enthusiasts.

AIChatGPTNeural Networks
0 likes · 80 min read
How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text
Open Source Linux
Open Source Linux
Sep 8, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

This article explains the inner workings of ChatGPT, covering how large language models predict the next token using probability distributions, the role of embeddings, the transformer architecture with attention heads, training methods, loss functions, and why such a massive neural network can produce coherent, human‑like language.

ChatGPTLanguage ModelNeural Networks
0 likes · 79 min read
How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text
21CTO
21CTO
Aug 15, 2023 · Artificial Intelligence

Why Do Neural Networks Suddenly ‘Grok’ After Long Training? Insights from Google

Google’s recent research reveals that when small neural networks are trained for extended periods on tasks like modular addition, they can abruptly shift from memorizing training data to genuinely generalizing—a sudden “grokking” phenomenon driven by weight decay and the emergence of periodic weight structures.

AI researchGeneralizationMLP
0 likes · 9 min read
Why Do Neural Networks Suddenly ‘Grok’ After Long Training? Insights from Google
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Aug 15, 2023 · Artificial Intelligence

Neural Networks for Rapid Network Configuration: A Concise Overview

The article presents a neural‑algorithmic reasoning approach that replaces slow SMT‑based network configuration tools with a graph‑neural‑network model, describing dataset creation, model architecture, and experiments that show 20‑to‑490× speedups while maintaining over 92% configuration consistency on large topologies.

Graph Neural NetworkNetwork ConfigurationNetwork Synthesis
0 likes · 5 min read
Neural Networks for Rapid Network Configuration: A Concise Overview
Alimama Tech
Alimama Tech
Aug 9, 2023 · Artificial Intelligence

End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising

The paper introduces Neural Lagrangian Selling, an end‑to‑end framework that jointly learns traffic forecasting and contract inventory allocation by embedding a differentiable Lagrangian solver and a graph convolutional network into a neural model, achieving higher prediction accuracy, fulfillment rates, utilization, and revenue than two‑stage and other methods.

Graph Neural NetworkNeural Networksend-to-end learning
0 likes · 16 min read
End-to-End Inventory Prediction and Contract Allocation for Guaranteed Delivery Advertising
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 31, 2023 · Artificial Intelligence

Overview of Deep Neural Network Architectures

This article provides a comprehensive overview of deep neural network families, introducing twelve major architectures—including Feedforward, CNN, RNN, LSTM, DBN, GAN, Autoencoder, Residual, Capsule, Transformer, Attention, and Deep Reinforcement Learning—explaining their principles, structures, training methods, and offering Python/TensorFlow/PyTorch code examples.

CNNDeep LearningGAN
0 likes · 29 min read
Overview of Deep Neural Network Architectures
Programmer DD
Programmer DD
Jul 20, 2023 · Artificial Intelligence

Why ChatGPT Mirrors Human Thought: Insights from Stephen Wolfram

ChatGPT, built on massive text training and simple neural network operations, generates human-like language yet lacks true understanding, prompting integration with Wolfram|Alpha’s precise computational language—a synergy highlighted by Stephen Wolfram’s insights on language structure, AI limits, and future computational possibilities.

ChatGPTComputational LanguageNeural Networks
0 likes · 13 min read
Why ChatGPT Mirrors Human Thought: Insights from Stephen Wolfram
Sohu Tech Products
Sohu Tech Products
Jul 19, 2023 · Artificial Intelligence

Understanding the Inner Workings of ChatGPT and Neural Networks

This article explains how ChatGPT generates text by predicting the next token using large language models, describes the role of probability, temperature, and attention mechanisms in transformers, and discusses neural network training, embeddings, semantic spaces, and the broader implications for artificial intelligence research.

ChatGPTNeural Networksartificial intelligence
0 likes · 79 min read
Understanding the Inner Workings of ChatGPT and Neural Networks
DataFunSummit
DataFunSummit
May 29, 2023 · Artificial Intelligence

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

This article introduces a neuron‑level shared multi‑task learning framework that jointly estimates click‑through rate (CTR) and conversion rate (CVR), discusses the background and advantages of multi‑task learning, reviews classic shared‑bottom models, describes the proposed pruning‑based architecture, and presents experimental results demonstrating its effectiveness in large‑scale recommendation systems.

CTRCVRModel Pruning
0 likes · 11 min read
Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction
21CTO
21CTO
May 7, 2023 · Artificial Intelligence

The Untold Journey of AI’s Godfather: Geoffrey Hinton’s Life, Legacy, and Risks

Geoffrey Hinton, the Canadian cognitive psychologist and computer scientist known as the father of deep learning, rose from a distinguished scientific family, endured decades of skepticism, pioneered deep belief networks, mentored future AI leaders, and now warns of AI risks after leaving Google, embodying a lifelong commitment to humanity and ethical AI.

AI historyAI riskGeoffrey Hinton
0 likes · 15 min read
The Untold Journey of AI’s Godfather: Geoffrey Hinton’s Life, Legacy, and Risks
DataFunTalk
DataFunTalk
Apr 3, 2023 · Artificial Intelligence

Implementing RNN, LSTM, and GRU with PyTorch

This article introduces the basic architectures of recurrent neural networks (RNN), LSTM, and GRU, explains PyTorch APIs such as nn.RNN, nn.LSTM, nn.GRU, details their parameters, demonstrates code examples for building and testing these models, and provides practical insights for deep learning practitioners.

Deep LearningGRULSTM
0 likes · 9 min read
Implementing RNN, LSTM, and GRU with PyTorch
DataFunTalk
DataFunTalk
Apr 1, 2023 · Artificial Intelligence

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

In a GTC fireside chat, Nvidia CEO Jensen Huang and OpenAI co‑founder Ilya Sutskever discuss GPT‑4's multimodal advances, the evolution of deep learning from early neural networks to large‑scale models, the pivotal role of GPUs and datasets like ImageNet, and their vision for more reliable, scalable artificial intelligence.

Deep LearningGPT-4Neural Networks
0 likes · 10 min read
Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI
Baidu Geek Talk
Baidu Geek Talk
Mar 8, 2023 · Artificial Intelligence

Understanding Motion Decomposition in AI-Based Image Animation: From Sparse to Dense Optical Flow

The article details how AI‑based image animation and face‑swapping decompose video motion into zero‑order rigid and first‑order affine components via Taylor expansion, using unsupervised U‑Net keypoint extraction, sparse-to-dense optical flow conversion, and dense motion networks that learn masks for region‑wise rigidity and non‑rigid deformation.

Neural NetworksTaylor expansionaffine transformation
0 likes · 21 min read
Understanding Motion Decomposition in AI-Based Image Animation: From Sparse to Dense Optical Flow
Top Architect
Top Architect
Mar 1, 2023 · Artificial Intelligence

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

This article provides a comprehensive overview of how ChatGPT works, covering its probabilistic text generation, transformer architecture, embedding representations, neural network training processes, and the underlying principles that enable large language models to produce coherent and meaningful human-like language.

AIChatGPTLanguage Model
0 likes · 80 min read
Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques
Model Perspective
Model Perspective
Jan 12, 2023 · Artificial Intelligence

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

This article introduces neural networks, covering their layered structure, common types like CNNs and RNNs, key components such as activation functions, loss, learning rate, backpropagation, dropout, batch normalization, and extends to reinforcement learning concepts including MDPs, policies, value functions, and Q‑learning.

CNNNeural NetworksRNN
0 likes · 6 min read
Neural Networks Explained: Architecture, Training, and Reinforcement Basics
MaGe Linux Operations
MaGe Linux Operations
Nov 26, 2022 · Artificial Intelligence

The Timeless Foundations of Machine Learning: 6 Core Algorithms Explained

Andrew Ng’s latest AI newsletter article revisits six foundational machine‑learning algorithms—linear regression, logistic regression, gradient descent, neural networks, decision trees, and k‑means clustering—tracing their historical origins, core concepts, and lasting impact on modern AI applications.

Decision TreesNeural Networksgradient descent
0 likes · 20 min read
The Timeless Foundations of Machine Learning: 6 Core Algorithms Explained
Model Perspective
Model Perspective
Oct 6, 2022 · Artificial Intelligence

Demystifying RNNs and LSTMs: Architecture, Limits, and Python Forecasting

This article explains the structure and operation of recurrent neural networks (RNNs), their limitations, how long short‑term memory (LSTM) networks overcome these issues with gated mechanisms, and provides a complete Python implementation for time‑series airline passenger forecasting.

LSTMNeural NetworksPython
0 likes · 17 min read
Demystifying RNNs and LSTMs: Architecture, Limits, and Python Forecasting
ELab Team
ELab Team
Aug 24, 2022 · Artificial Intelligence

Demystifying AI: From Linear Regression to Neural Networks with TensorFlow.js

This article walks through the fundamentals of artificial intelligence, explaining linear and logistic regression, loss functions, gradient descent, and neural network basics, illustrated with TensorFlow.js code examples, visual analogies, and practical demos, helping readers grasp core concepts and their real‑world applications.

Neural NetworksTensorFlow.jsartificial intelligence
0 likes · 18 min read
Demystifying AI: From Linear Regression to Neural Networks with TensorFlow.js
Model Perspective
Model Perspective
Aug 8, 2022 · Artificial Intelligence

Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide

This tutorial walks through using Keras to create, compile, train, and evaluate a multi‑layer perceptron for image classification on the Fashion MNIST dataset, covering data loading, model construction with the Sequential API, hyperparameter choices, and prediction of new samples.

Fashion-MNISTKerasMLP
0 likes · 16 min read
Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide
DataFunSummit
DataFunSummit
Jun 11, 2022 · Artificial Intelligence

Transforming Regular Expressions into Neural Networks for Text Classification and Slot Filling

This article explains how regular expressions can be converted into equivalent neural network models—FA‑RNN for classification and FST‑RNN for slot filling—by leveraging finite‑state automata, tensor decomposition, and pretrained word embeddings, achieving zero‑shot performance and strong results in low‑resource scenarios.

FA-RNNNeural Networksregular expressions
0 likes · 17 min read
Transforming Regular Expressions into Neural Networks for Text Classification and Slot Filling
Code DAO
Code DAO
May 26, 2022 · Artificial Intelligence

Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process

This article explains the fundamentals of denoising diffusion probabilistic models, detailing the forward Gaussian noise injection, the reverse reconstruction via learned conditional densities, model architecture, loss functions, and experimental results on synthetic datasets, all supported by key research citations.

Generative ModelsMarkov chainNeural Networks
0 likes · 8 min read
Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process
DataFunTalk
DataFunTalk
May 16, 2022 · Artificial Intelligence

Applying Knowledge Graphs to Meituan's Recommendation System: Architecture, Challenges, and Future Directions

This article presents Meituan's large‑scale knowledge graph, its integration into location‑based recommendation, the challenges of explainability, domain diversity, data sparsity and spatiotemporal complexity, and describes a dual‑memory neural network and cross‑domain learning approach that improve recall, ranking and recommendation fairness.

AIKnowledge GraphNeural Networks
0 likes · 15 min read
Applying Knowledge Graphs to Meituan's Recommendation System: Architecture, Challenges, and Future Directions
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 19, 2022 · Artificial Intelligence

Understanding Nonlinearity in Machine Learning: From Logistic Regression to Neural Networks

The article explores the concept of nonlinearity in machine learning, illustrating why tasks like distinguishing cat versus dog or predicting body shape from height and weight are challenging for linear models, and discusses feature engineering, kernel tricks, and periodic activation functions as strategies to introduce nonlinearity and improve model performance.

Neural Networksfeature engineeringkernel methods
0 likes · 7 min read
Understanding Nonlinearity in Machine Learning: From Logistic Regression to Neural Networks
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Apr 11, 2022 · Artificial Intelligence

Understanding AI: Definitions, Applications in Games and Products, and Basic Machine Learning Concepts

This article explains what artificial intelligence is, distinguishes weak and strong AI, explores its applications in games, product testing, and NetEase's Fuxi platform, and introduces fundamental machine‑learning concepts such as supervised, unsupervised, and reinforcement learning, as well as neural networks and loss functions.

AINeural Networksgame AI
0 likes · 10 min read
Understanding AI: Definitions, Applications in Games and Products, and Basic Machine Learning Concepts
DeWu Technology
DeWu Technology
Mar 11, 2022 · Artificial Intelligence

Deep Learning in Face Recognition

The article surveys deep‑learning‑based face‑recognition systems, detailing detection, preprocessing, and recognition pipelines, describing evaluation metrics such as TAR, FAR, and Rank‑K, reviewing major datasets like LFW, MS‑Celeb‑1M and VGGFace2, and comparing leading architectures—including FaceNet, CenterLoss, SphereFace and InsightFace—while highlighting their strengths, limitations, real‑world applications, and seminal research references.

AIDatasetsDeep Learning
0 likes · 14 min read
Deep Learning in Face Recognition
DataFunSummit
DataFunSummit
Jan 29, 2022 · Artificial Intelligence

Survey of Model Pruning and Quantization Techniques for Deep Learning

This article provides a comprehensive overview of recent advances in deep learning model compression, focusing on pruning methods—including unstructured, structured, filter-wise, channel-wise, shape-wise, and stripe-wise approaches—and quantization techniques such as linear, non‑linear, clustering, power‑of‑two, binary, and 8‑bit quantization, while discussing evaluation criteria, sparsity ratios, fine‑tuning, and training‑aware quantization.

Deep LearningNeural Networksmodel compression
0 likes · 23 min read
Survey of Model Pruning and Quantization Techniques for Deep Learning
Laiye Technology Team
Laiye Technology Team
Jan 28, 2022 · Artificial Intelligence

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

This article provides a comprehensive overview of deep learning model compression and acceleration methods, detailing pruning strategies, various pruning types, evaluation criteria, sparsity ratios, fine‑tuning procedures, as well as linear and non‑linear quantization approaches, their implementations, and practical considerations.

Deep LearningNeural Networksefficiency
0 likes · 26 min read
Survey of Model Compression and Quantization Techniques for Deep Neural Networks
Python Programming Learning Circle
Python Programming Learning Circle
Jan 18, 2022 · Artificial Intelligence

Fashion MNIST Image Classification Using TensorFlow 2.x in Python

This tutorial demonstrates how to load the Fashion MNIST dataset, explore and preprocess the images, build and compile a neural network with TensorFlow 2.x, train the model, evaluate its accuracy, and use the trained model to make predictions on clothing images, providing complete Python code examples throughout.

Deep LearningFashion-MNISTImage Classification
0 likes · 16 min read
Fashion MNIST Image Classification Using TensorFlow 2.x in Python
Code DAO
Code DAO
Dec 24, 2021 · Artificial Intelligence

Understanding Neural Network Predictions with Integrated Gradients

This article introduces the Integrated Gradients (IG) method for explaining deep neural networks, compares it with saliency maps and Shapley‑based approaches, discusses its axiomatic foundations, and provides a step‑by‑step guide to implementing IG using the open‑source TruLens library, including custom baselines and attribution measures.

Attribution MethodsDeep LearningIntegrated Gradients
0 likes · 14 min read
Understanding Neural Network Predictions with Integrated Gradients
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 21, 2021 · Artificial Intelligence

Boost Time Series Forecasting with Autocorrelated Error Adjustment – A 5‑Line PyTorch Trick

This article explains a NeurIPS 2021 paper that introduces a learnable autocorrelation correction for neural network time‑series models, shows the underlying theory, provides concise PyTorch code implementing the adjustment, reports a ~17% average performance gain across datasets, and lists additional practical tricks for time‑series forecasting.

Neural NetworksPyTorchTime Series
0 likes · 6 min read
Boost Time Series Forecasting with Autocorrelated Error Adjustment – A 5‑Line PyTorch Trick
Code DAO
Code DAO
Dec 5, 2021 · Artificial Intelligence

Why Neural Networks Need Batch Normalization: Principles and Mechanics

The article explains the principle behind Batch Normalization, why it is essential for training deep neural networks, how it standardizes activations, the role of learnable scale and shift parameters, the computation steps during training and inference, and discusses placement strategies within a model.

Batch NormalizationDeep LearningNeural Networks
0 likes · 9 min read
Why Neural Networks Need Batch Normalization: Principles and Mechanics
Code DAO
Code DAO
Dec 5, 2021 · Artificial Intelligence

Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST

This article explains DeepMind’s PonderNet framework, which lets any neural network allocate computation adaptively, demonstrates its implementation with PyTorch Lightning on the MNIST dataset, details the underlying theory, loss functions, training procedure, and evaluates its pondering behavior on rotated digit experiments.

Adaptive ComputationDeep LearningMNIST
0 likes · 27 min read
Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST
DataFunTalk
DataFunTalk
Dec 4, 2021 · Artificial Intelligence

Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More

This article compiles essential deep learning training techniques—including cyclic learning rates, flooding, warmup, RAdam optimizer, adversarial training, focal loss, dropout, batch/group/weight normalization, label smoothing, Wasserstein GAN, skip connections, and weight initialization—providing concise explanations and code snippets for each method.

Deep LearningNeural NetworksRegularization
0 likes · 11 min read
Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More
360 Smart Cloud
360 Smart Cloud
Sep 30, 2021 · Artificial Intelligence

Understanding Computational Graphs and Automatic Differentiation for Neural Networks

This article explains how computational graphs can represent arbitrary neural networks, describes forward and reverse propagation, details the implementation of automatic differentiation with Python and NumPy, and demonstrates building and training a multilayer fully‑connected network on the MNIST dataset using custom graph nodes and optimizers.

Computational GraphDeep LearningNeural Networks
0 likes · 29 min read
Understanding Computational Graphs and Automatic Differentiation for Neural Networks
360 Smart Cloud
360 Smart Cloud
Aug 31, 2021 · Artificial Intelligence

Understanding Convolution, Convolutional Neural Networks, and Their Implementation in Image Processing

This article explains the mathematical concept of 2‑D convolution, demonstrates its use for image filtering with examples such as blurring and Sobel edge detection, introduces artificial neural networks and back‑propagation, and details the design, training, and performance of convolutional neural networks for tasks like Sobel filter learning and MNIST digit recognition, including full Python code examples.

CNNConvolutionDeep Learning
0 likes · 25 min read
Understanding Convolution, Convolutional Neural Networks, and Their Implementation in Image Processing