Tagged articles

1235 articles

Page 1 of 13

May 9, 2026 · Artificial Intelligence

NOSE: Enabling AI to Smell with a Unified Molecule‑Receptor‑Semantic Tri‑modal Representation

NOSE introduces a neural olfactory‑semantic embedding that unifies molecular structure, receptor sequences, and natural‑language odor descriptions into a continuous space, achieving state‑of‑the‑art results on eleven tasks and strong zero‑shot generalization for odor and receptor retrieval.

Deep Learningcontrastive learningmolecular design

0 likes · 8 min read

NOSE: Enabling AI to Smell with a Unified Molecule‑Receptor‑Semantic Tri‑modal Representation

Machine Heart

May 7, 2026 · Artificial Intelligence

OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts

The paper introduces OrthoReg, a lightweight orthogonal regularization added during fine‑tuning that provably enforces weight orthogonality, thereby resolving conflicts in model merging and providing a theoretical explanation for the success of task arithmetic.

Deep LearningOrthoRegOrthogonal Regularization

0 likes · 12 min read

OrthoReg: Simple Orthogonal Regularization to Eliminate Model Merging Conflicts

Machine Learning Algorithms & Natural Language Processing

May 5, 2026 · Artificial Intelligence

LLMBeginner: A Project‑Based Roadmap for Zero‑Base Mastery of Large Language Models

The LLMBeginner project from the MLNLP community offers a staged, project‑oriented learning path—covering big‑picture concepts, deep learning and reinforcement learning fundamentals, LLM theory and practice, and agent development—to guide beginners from fragmented resources to systematic mastery, with both concise and detailed versions hosted on GitHub.

AgentDeep LearningGitHub

0 likes · 5 min read

LLMBeginner: A Project‑Based Roadmap for Zero‑Base Mastery of Large Language Models

Data Party THU

May 2, 2026 · Artificial Intelligence

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

A new collaborative paper from top universities proposes a unified “Learning Mechanics” framework for deep learning, outlining five research strands—from solvable idealized models and extreme limits to empirical scaling laws and hyper‑parameter theory—while drawing analogies to classical physics and highlighting ten open challenges.

Deep Learninghyperparameter theorylearning mechanics

0 likes · 16 min read

Finally, Researchers Uncover Deep Learning’s “Newton’s Law”

Machine Learning Algorithms & Natural Language Processing

Apr 27, 2026 · Artificial Intelligence

The Emerging ‘Newton’s Law’ of Deep Learning: Toward a Scientific Theory

Amid rapid scaling of large models, a new paper by researchers from UC Berkeley, Harvard, and Stanford proposes a unified "Learning Mechanics" framework that stitches together five theoretical strands—idealized solvable settings, extreme limits, empirical laws, hyperparameter theory, and universal behavior—to begin forming a scientific theory of deep learning.

Deep LearningNTKTheoretical AI

0 likes · 18 min read

The Emerging ‘Newton’s Law’ of Deep Learning: Toward a Scientific Theory

Machine Heart

Apr 26, 2026 · Artificial Intelligence

Has Deep Learning Discovered Its Own “Newton’s Law”?

A new collaborative paper titled “There Will Be a Scientific Theory of Deep Learning” proposes a unified “Learning Mechanics” framework that connects solvable idealized models, tractable limits, empirical scaling laws, hyperparameter theory, and universal representation behavior, aiming to give deep learning a first‑principles scientific foundation.

Deep LearningNeural Networkshyperparameters

0 likes · 14 min read

Has Deep Learning Discovered Its Own “Newton’s Law”?

Code Mala Tang

Apr 22, 2026 · Artificial Intelligence

How LeWorldModel Achieves Stable End‑to‑End World Modeling with Just Two Losses

LeWorldModel, a 2026 JEPA‑based world model introduced by Yann LeCun and collaborators, solves representation collapse with a minimalist two‑loss objective, delivering a 15‑million‑parameter system that trains in hours, runs 48× faster than prior baselines, and reaches near‑SOTA performance on robot control benchmarks.

Deep LearningEmbodied AIJEPA

0 likes · 6 min read

How LeWorldModel Achieves Stable End‑to‑End World Modeling with Just Two Losses

Weekly Large Model Application

Apr 16, 2026 · Artificial Intelligence

Deep Dive into Conformer: The Convolution‑Augmented Transformer for Speech Recognition

The Conformer architecture blends global self‑attention with a depthwise separable convolution module in a Macaron‑style block, addressing the strong local time‑frequency structure and long sequence length of speech signals while keeping computational cost manageable for modern ASR systems.

ASRConformerConvolution

0 likes · 11 min read

Deep Dive into Conformer: The Convolution‑Augmented Transformer for Speech Recognition

AI Agent Research Hub

Apr 16, 2026 · Artificial Intelligence

Conditionally Adaptive Augmented Lagrangian PINNs for Forward and Inverse PDE Solving (CMAME Open‑Source Code)

The article analyzes the multi‑objective loss imbalance in physics‑informed neural networks, introduces the CAPU algorithm that assigns independent adaptive penalty parameters via an RMSProp‑inspired update with a max‑protection rule, and demonstrates its superior accuracy on a range of forward and inverse PDE benchmarks, providing theoretical guarantees and open‑source PyTorch code.

CAPUDeep LearningPDE solving

0 likes · 23 min read

Conditionally Adaptive Augmented Lagrangian PINNs for Forward and Inverse PDE Solving (CMAME Open‑Source Code)

Zhuanzhuan Tech

Apr 15, 2026 · Artificial Intelligence

Boosting Bag Item Identification with Metric Learning: A ZhiZhuan Case Study

ZhiZhuan’s in‑house “photo‑to‑SKU” system tackles large‑scale bag identification by combining dual‑stage object detection, metric‑learning‑based embedding training, and a hybrid vector‑plus‑scalar retrieval pipeline, achieving superior top‑K accuracy over third‑party solutions while addressing fine‑grained visual nuances and long‑tail SKU coverage.

Deep LearningEmbeddingbag identification

0 likes · 16 min read

Boosting Bag Item Identification with Metric Learning: A ZhiZhuan Case Study

DeWu Technology

Apr 15, 2026 · Industry Insights

How Generative AI is Transforming Recommendation: A Deep Dive into DeWu’s Recall System

This article analyzes DeWu's generative recall system, detailing its background, technical design of the Generative and Rerank models, inference workflow, experimental gains in core consumption and diversity metrics, and future engineering directions such as framework migration, LLM integration, and multimodal generation.

Deep Learninggenerative AIindustry insight

0 likes · 12 min read

How Generative AI is Transforming Recommendation: A Deep Dive into DeWu’s Recall System

HyperAI Super Neural

Apr 13, 2026 · Artificial Intelligence

How French Researchers Used Deep Learning to Predict 2.39 Million Anti‑Phage Proteins and Map Bacterial Immunity

A French team at the Pasteur Institute built three complementary deep‑learning models—ALBERT_DF, ESM_DF, and GeneCLR_DF—to predict anti‑phage proteins at genome scale, achieving 99% precision and 92% recall, and uncovered roughly 2.39 million candidate proteins and 23 000 novel operon families, dramatically expanding the known bacterial antiviral repertoire.

ALBERTDeep LearningESM

0 likes · 16 min read

How French Researchers Used Deep Learning to Predict 2.39 Million Anti‑Phage Proteins and Map Bacterial Immunity

AIWalker

Apr 10, 2026 · Artificial Intelligence

How RealRestorer Bridges the Gap in Real‑World Image Restoration

RealRestorer leverages large‑scale image‑editing models, a hybrid synthetic‑and‑real degradation pipeline, and a two‑stage training strategy to deliver state‑of‑the‑art open‑source restoration that generalizes across nine real‑world degradation types while preserving content consistency.

BenchmarkComputer VisionDeep Learning

0 likes · 13 min read

How RealRestorer Bridges the Gap in Real‑World Image Restoration

HyperAI Super Neural

Apr 9, 2026 · Artificial Intelligence

Cornell’s EMSeek Generates Insights from EM Images in 2–5 Minutes, 50× Faster Than Experts

EMSeek, a modular multi‑agent platform from Cornell, integrates perception, structural reconstruction, property prediction, and literature reasoning to automate electron microscopy analysis across 20 material systems and five tasks, achieving up to twice the speed of Segment Anything, over 90% structural similarity, and a 50‑fold reduction in processing time compared with expert workflows, while requiring only about 2 % labeled data for calibration.

Computer VisionDeep LearningEMSeek

0 likes · 16 min read

Cornell’s EMSeek Generates Insights from EM Images in 2–5 Minutes, 50× Faster Than Experts

Data Party THU

Apr 3, 2026 · Artificial Intelligence

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

The article reviews the Kimi team's Attention Residuals approach, which substitutes traditional ResNet additive shortcuts with learned attention‑based weighting, explains the theoretical motivation linking depth to time, details full‑attention and block‑wise implementations, presents experimental results showing up to 1.25× compute efficiency and improved performance on reasoning and knowledge tasks.

Attention MechanismDeep LearningResidual Networks

0 likes · 11 min read

Can Attention Replace Residuals? Inside the New Attention Residuals Breakthrough

JakartaEE China Community

Apr 1, 2026 · Artificial Intelligence

Top Java AI Development Tools for 2025

This guide reviews eight leading AI development tools for Java in 2025, explaining how each library or framework—such as DJL, TensorFlow Java, Hugging Face, LangChain, Apache Kafka, Ray, Deeplearning4j, and Neo4j—enables Java developers to build, train, and deploy intelligent applications without switching languages.

AIDeep LearningJava

0 likes · 9 min read

HyperAI Super Neural

Mar 30, 2026 · Artificial Intelligence

MIT Introduces VibeGen: The First End‑to‑End Dynamic Protein Generator Linking Sequence and Vibration

MIT and Carnegie Mellon unveil VibeGen, an agentic end‑to‑end de novo protein design system that jointly generates amino‑acid sequences and predicts low‑frequency normal‑mode dynamics, achieving stable, novel structures that faithfully reproduce target vibrational amplitudes and demonstrating high‑precision, diverse, and novel protein engineering capabilities.

Deep LearningVibeGenlanguage diffusion model

0 likes · 13 min read

MIT Introduces VibeGen: The First End‑to‑End Dynamic Protein Generator Linking Sequence and Vibration

AI Large-Model Wave and Transformation Guide

Mar 28, 2026 · Artificial Intelligence

What Large‑Model Training Actually Optimizes: Parameters, Attention, and Knowledge Explained

This article breaks down the core of large‑model training by showing that training optimizes neural‑network parameters, that attention is a mechanism realized by those parameters, and that knowledge is encoded implicitly within the weight matrices, providing a clear hierarchy for interview or presentation use.

AI InterviewAttention MechanismDeep Learning

0 likes · 6 min read

What Large‑Model Training Actually Optimizes: Parameters, Attention, and Knowledge Explained

Qborfy AI

Mar 24, 2026 · Artificial Intelligence

Why Full Fine‑Tuning Beats LoRA: When and How to Update Every Model Parameter

This article explains full fine‑tuning—updating all parameters of a pretrained model—to achieve the highest task performance, compares it with LoRA and prompt tuning, shows when it is appropriate, provides a step‑by‑step Hugging Face implementation, memory‑saving tricks, common pitfalls, and practical takeaways.

Deep LearningDeepSpeedGPU Memory

0 likes · 9 min read

Why Full Fine‑Tuning Beats LoRA: When and How to Update Every Model Parameter

AI Agent Research Hub

Mar 24, 2026 · Artificial Intelligence

How PeRCNN Turns Convolution Kernels into Differential Operators for Physics‑Informed Learning

PeRCNN embeds physics directly into its architecture by replacing additive nonlinearities with element‑wise multiplication in Π‑blocks, enabling convolution kernels to act as finite‑difference operators, which yields superior forward and inverse PDE solving, accurate coefficient identification, robust equation discovery, and interpretable models, as demonstrated on multiple reaction‑diffusion benchmarks.

Deep LearningPeRCNNconvolutional neural network

0 likes · 22 min read

How PeRCNN Turns Convolution Kernels into Differential Operators for Physics‑Informed Learning

AIWalker

Mar 22, 2026 · Artificial Intelligence

How SAP Cuts 90% Compute and Boosts 4K Panorama Segmentation Accuracy by 17.2%

The SAP framework transforms a static 4K equirectangular panorama into a pseudo‑video, fine‑tunes SAM2 with synthetic data and a column‑first scanning trajectory, slashing GPU memory use by 90% while raising zero‑shot mIoU by an average of 17.2% across multiple benchmarks.

Deep LearningSAM2panorama segmentation

0 likes · 15 min read

How SAP Cuts 90% Compute and Boosts 4K Panorama Segmentation Accuracy by 17.2%

Amap Tech

Mar 20, 2026 · Artificial Intelligence

How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation

ABot-PhysWorld introduces a physically consistent video generation framework for embodied AI, leveraging the PAI‑Bench benchmark, large‑scale multi‑modal data, DPO preference alignment, and dense action maps to surpass SOTA models in both visual quality and physical plausibility across diverse robotic tasks.

BenchmarkDeep LearningEmbodied AI

0 likes · 15 min read

How ABot-PhysWorld Achieves Physical Consistency in Embodied Video Generation

SuanNi

Mar 17, 2026 · Artificial Intelligence

How Attention Residuals Boost Transformer Efficiency and Scale

The article presents the Attention Residuals architecture, explains how it replaces uniform residual addition with learned attention‑based aggregation, details full and block variants, engineering tricks for distributed training, and shows extensive scaling‑law experiments where the new design consistently improves validation loss and training efficiency across model sizes.

Attention ResidualsDeep LearningModel Scaling

0 likes · 13 min read

How Attention Residuals Boost Transformer Efficiency and Scale

PaperAgent

Mar 17, 2026 · Artificial Intelligence

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

This article analyzes the newly released Attention Residuals paper, explaining how learnable attention weighting replaces fixed residual addition to mitigate information dilution in deep LLMs, detailing the proposed Block AttnRes design, engineering trade‑offs, experimental results, and its significance for foundational model architecture.

Block AttentionDeep LearningLLM

0 likes · 9 min read

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

ShiZhen AI

Mar 17, 2026 · Artificial Intelligence

Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE

The Kimi team introduces Attention Residuals, a softmax‑based replacement for the uniform residual connections used in Transformers for a decade, enabling selective aggregation of layer histories, reducing hidden‑state growth, and achieving a 1.25× compute‑efficiency gain on a 48‑billion‑parameter MoE model with less than 2% inference latency increase.

Attention ResidualsCompute EfficiencyDeep Learning

0 likes · 10 min read

Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE

AI Frontier Lectures

Mar 16, 2026 · Artificial Intelligence

How LoGeR Extends 3D Reconstruction to Thousands of Frames with Hybrid Memory

LoGeR, a new long‑context geometric reconstruction framework from DeepMind and UC Berkeley, uses a hybrid memory module combining test‑time‑training (TTT) and sliding‑window attention (SWA) to enable feed‑forward 3D reconstruction over sequences of up to tens of thousands of frames, achieving state‑of‑the‑art accuracy on KITTI, VBR, 7‑Scenes, ScanNetV2 and TUM‑Dynamics benchmarks.

3D reconstructionDeep LearningHybrid Memory

0 likes · 11 min read

How LoGeR Extends 3D Reconstruction to Thousands of Frames with Hybrid Memory

Shi's AI Notebook

Mar 16, 2026 · Artificial Intelligence

What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion

This article walks through MiniMind's Attention.forward implementation, explaining why Q, K, and V are created, how tensors are reshaped for multi‑head attention, the role of masks, KV cache, GQA, and how each token aggregates information from the entire context.

Deep LearningKV cacheTransformer

0 likes · 21 min read

What Attention Actually Does in MiniMind: Tracing Q/K/V, Shape Changes, and Context Fusion

HyperAI Super Neural

Mar 4, 2026 · Artificial Intelligence

MIT’s APOLLO Framework Breaks Limits, Separating Shared and Modality‑Specific Cell Signals

MIT and ETH Zurich introduce APOLLO, a deep‑learning autoencoder that learns a partially overlapping latent space to explicitly disentangle shared and modality‑specific information in multimodal single‑cell datasets, demonstrating superior cell‑type classification, cross‑modal prediction, and protein localization insights across sequencing and imaging data.

AutoencoderDeep LearningLatent Space

0 likes · 14 min read

MIT’s APOLLO Framework Breaks Limits, Separating Shared and Modality‑Specific Cell Signals

HyperAI Super Neural

Mar 2, 2026 · Artificial Intelligence

MIT's Pichia-CLM model learns yeast DNA language, boosting protein yield up to 3‑fold

A MIT research team introduced Pichia-CLM, a GRU‑based language model trained on a 27 k‑pair Pichia pastoris dataset that optimizes codon usage, and demonstrated across six proteins that it consistently outperforms four commercial codon‑optimization tools, delivering up to a three‑fold increase in heterologous protein secretion.

Deep LearningGRUPichia pastoris

0 likes · 13 min read

MIT's Pichia-CLM model learns yeast DNA language, boosting protein yield up to 3‑fold

Code Mala Tang

Mar 1, 2026 · Artificial Intelligence

Why YOLO Dominates Real-Time Object Detection: A Complete Guide

This article provides a comprehensive overview of the YOLO (You Only Look Once) algorithm, explaining its core principles, architecture, version history, training workflow, real‑world applications, strengths, and current limitations for modern computer‑vision tasks.

Computer VisionDeep LearningReal-Time

0 likes · 9 min read

Why YOLO Dominates Real-Time Object Detection: A Complete Guide

AI Agent Research Hub

Feb 24, 2026 · Artificial Intelligence

Why PINNs Training Fails: Diagnosing and Fixing Gradient Pathologies

The article explains that physics‑informed neural networks often stall because the PDE residual loss dominates the boundary‑condition loss, causing severe gradient imbalance, and presents two remedies—an adaptive loss‑weighting scheme and a modified fully‑connected architecture—that together can improve prediction accuracy by up to two orders of magnitude.

Deep LearningPDEPINNs

0 likes · 28 min read

Why PINNs Training Fails: Diagnosing and Fixing Gradient Pathologies

HyperAI Super Neural

Feb 22, 2026 · Artificial Intelligence

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

This article surveys the latest open‑source OCR models—including GLM‑OCR, PaddleOCR‑VL‑1.5, LightOnOCR‑2‑1B, DeepSeek‑OCR 2, and MonkeyOCR—detailing their architectures, benchmark scores on OmniDocBench, hardware requirements, and how to run them via online demos.

Computer VisionDeep LearningModel Benchmark

0 likes · 8 min read

OCR Models Guide: DeepSeek, PaddlePaddle, Others for High Accuracy & Local Deployment

Qborfy AI

Feb 21, 2026 · Artificial Intelligence

How Self-Attention Powers Modern AI: From Theory to Real-World Impact

This article explains the self‑attention mechanism behind transformers, detailing its core components, mathematical formulation, step‑by‑step example, multi‑head extension, industry use cases, and a thorough comparison with RNN and CNN approaches, all supported by concrete numbers and citations.

Attention MechanismDeep LearningSelf-Attention

0 likes · 8 min read

How Self-Attention Powers Modern AI: From Theory to Real-World Impact

AI Agent Research Hub

Feb 21, 2026 · Artificial Intelligence

Why Physics‑Informed Neural Networks (PINNs) Became a 20,000‑Citation Breakthrough

This article reviews the highly cited 2019 JCP paper that introduced Physics‑Informed Neural Networks, explains their core idea of embedding PDE residuals into the loss, compares them with contemporaneous methods, details implementation choices, showcases forward and inverse experiments, and discusses their impact, limitations, and future research directions.

Deep LearningPINNspartial differential equations

0 likes · 26 min read

Why Physics‑Informed Neural Networks (PINNs) Became a 20,000‑Citation Breakthrough

AI Cyberspace

Feb 14, 2026 · Artificial Intelligence

Unpacking the Transformer: From Embeddings to Multi‑Head Attention

This article provides a comprehensive, step‑by‑step walkthrough of the Transformer architecture, covering input embedding, positional encoding, the mechanics of Q‑K‑V attention, scaled dot‑product formulas, multi‑head and masked attention, feed‑forward networks, residual connections, layer normalization, decoder generation, and recent attention‑optimization techniques.

Deep LearningFeed-Forward NetworkPositional Encoding

0 likes · 39 min read

Unpacking the Transformer: From Embeddings to Multi‑Head Attention

Bighead's Algorithm Notes

Feb 13, 2026 · Artificial Intelligence

How ReVol’s Return‑Volatility Normalization Reduces Distribution Shift in Stock Price Prediction

The paper introduces ReVol, a three‑stage framework that normalizes price features, uses an attention‑based estimator to recover return and volatility, and denormalizes predictions, demonstrating consistent improvements of over 0.03 in IC and 0.7 in Sharpe ratio across multiple time‑series models.

Deep LearningFinancial AIattention estimator

0 likes · 15 min read

How ReVol’s Return‑Volatility Normalization Reduces Distribution Shift in Stock Price Prediction

AI Cyberspace

Feb 13, 2026 · Artificial Intelligence

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

This article traces the evolution of attention mechanisms from their inaugural application in computer vision and machine translation to their central role in modern Transformer models, detailing the underlying RNN‑Attention designs, the breakthrough in sequence alignment, and the innovations that enabled high‑performance, parallelizable deep learning architectures.

Attention MechanismComputer VisionDeep Learning

0 likes · 14 min read

How Attention Mechanisms Revolutionized Computer Vision and Machine Translation

Machine Learning Algorithms & Natural Language Processing

Feb 11, 2026 · Industry Insights

xAI Turmoil: Three Chinese Co‑Founders Exit in a Month, Halving the Original Team

Within a month, xAI lost three Chinese co‑founders—including Greg Yang, Tony Wu, and Jimmy Ba—reducing its original 12‑person founding team by half, a turnover that analysts say could jeopardize the post‑SpaceX merger IPO and the company's competitive edge in AI.

AI startupDeep LearningElon Musk

0 likes · 8 min read

xAI Turmoil: Three Chinese Co‑Founders Exit in a Month, Halving the Original Team

Tencent Technical Engineering

Feb 2, 2026 · Artificial Intelligence

Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models

This comprehensive guide walks through the fundamentals of neural networks, activation functions, training methods, and how they power large language models, while also covering tokenization, self‑attention, transformer architectures, AI infrastructure, and practical usage through agents and retrieval‑augmented generation.

Agent SystemsDeep LearningGPU infrastructure

0 likes · 75 min read

Why Neural Networks Are the Hidden Engine Behind Modern AI: From Basics to Large Language Models

DaTaobao Tech

Feb 2, 2026 · Operations

How Policy Regularization Boosts Deep Reinforcement Learning for Large‑Scale Inventory Management

This article presents DeepStock, a deep reinforcement learning framework with policy regularization that integrates classic inventory heuristics, achieving 7% turnover reduction and multi‑million cost savings across millions of SKU‑warehouse pairs in Alibaba's self‑operated ecosystem.

Deep LearningIndustrial AIOperations Research

0 likes · 18 min read

How Policy Regularization Boosts Deep Reinforcement Learning for Large‑Scale Inventory Management

21CTO

Jan 26, 2026 · Artificial Intelligence

What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements

PyTorch 2.10 introduces extensive upgrades for AMD ROCm, Intel XPU, and NVIDIA CUDA, adds new Torch XPU APIs, expands Python 3.14 support, and brings performance‑focused improvements such as fused kernels and enhanced quantization, all available via the official GitHub release.

CUDADeep LearningGPU

0 likes · 4 min read

What’s New in PyTorch 2.10? Deep Dive into GPU and CUDA Enhancements

AI Architecture Hub

Jan 19, 2026 · Artificial Intelligence

Demystifying the Transformer: From Input Embedding to Multi‑Head Attention

This article breaks down the core components of the Transformer architecture—including input embedding, positional encoding, multi‑head self‑attention, residual connections with layer normalization, position‑wise feed‑forward networks, and the rationale behind stacking multiple encoder layers—using clear explanations and illustrative diagrams.

Add&NormDeep LearningFeed Forward

0 likes · 12 min read

Demystifying the Transformer: From Input Embedding to Multi‑Head Attention

AI Large Model Application Practice

Jan 15, 2026 · Artificial Intelligence

Why Transformers Need Positional Embeddings and How They Work

This article explains the order‑blindness of Transformer self‑attention, why naïvely adding raw position indices harms semantics, and walks through sinusoidal, learnable, and rotary positional encodings together with PI and YaRN techniques for extending sequence length.

AIDeep LearningLLM

0 likes · 12 min read

Why Transformers Need Positional Embeddings and How They Work

AI Cyberspace

Jan 13, 2026 · Artificial Intelligence

From Symbolic AI to LLMs: A Complete NLP History and Model Guide

This article provides a comprehensive overview of natural language processing, tracing its evolution from early symbolic and statistical stages through deep learning breakthroughs, detailing sequence models, key NLP tasks, text representation methods, and the development of modern architectures like RNN, LSTM, GRU, Transformer, and GPT series.

Deep LearningGPTLSTM

0 likes · 60 min read

From Symbolic AI to LLMs: A Complete NLP History and Model Guide

AI Frontier Lectures

Jan 7, 2026 · Artificial Intelligence

RankSEG: Boost Semantic Segmentation Accuracy with Just Three Lines of Code

This article reveals that the conventional threshold/argmax post‑processing for semantic segmentation is sub‑optimal for Dice/IoU metrics, introduces the RankSEG framework that optimizes predictions without retraining, and presents an efficient RankSEG‑RMA approximation with extensive experiments showing consistent performance gains.

Deep LearningDice optimizationRankSEG

0 likes · 12 min read

RankSEG: Boost Semantic Segmentation Accuracy with Just Three Lines of Code

AI Frontier Lectures

Jan 7, 2026 · Artificial Intelligence

How Bi‑C2R Achieves Re‑indexing‑Free Lifelong Person Re‑identification

The paper introduces Bi‑C2R, a bidirectional continual compatible representation framework that eliminates the need for feature re‑extraction while enabling lifelong person re‑identification through novel transfer, distillation, and dynamic fusion modules, achieving state‑of‑the‑art accuracy on multiple benchmarks.

Deep LearningIEEE TPAMILifelong Learning

0 likes · 15 min read

How Bi‑C2R Achieves Re‑indexing‑Free Lifelong Person Re‑identification

AI Architecture Hub

Jan 7, 2026 · Artificial Intelligence

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

This article provides a comprehensive, beginner‑friendly walkthrough of the landmark 2017 paper “Attention Is All You Need,” covering its authors, historical context, the shortcomings of RNNs and CNNs, the birth of self‑attention, the Transformer architecture, and its transformative impact on modern AI.

AI historyAttention MechanismDeep Learning

0 likes · 9 min read

Why “Attention Is All You Need” Still Shapes AI: A Beginner’s Deep Dive

AI Architecture Hub

Jan 2, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost LLM Performance with Minimal Overhead

DeepSeek's new mHC architecture projects residual connections onto a manifold, enabling a 6.7% training cost increase for 27B models while delivering significant stability and downstream performance gains over traditional residual and hyper‑connection designs.

Deep LearningLLMManifold Optimization

0 likes · 13 min read

How Manifold-Constrained Hyper-Connections Boost LLM Performance with Minimal Overhead

Architect

Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI ArchitectureDeep LearningLarge-Scale Training

0 likes · 7 min read

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

HyperAI Super Neural

Dec 30, 2025 · Artificial Intelligence

Explicit Geological Constraints + Data‑Driven Modeling Improves Cross‑Regional Mineral Prospectivity and Interpretability

Zhejiang University researchers introduce an anisotropic spatial proximity neural network combined with attention‑weighted logistic regression, explicitly embedding geological constraints into mineral prospectivity mapping, and demonstrate superior recall, overall performance, and interpretability across both a classic Canadian gold benchmark and a large‑scale US copper province.

Deep LearningInterpretabilityanisotropic spatial proximity

0 likes · 12 min read

Explicit Geological Constraints + Data‑Driven Modeling Improves Cross‑Regional Mineral Prospectivity and Interpretability

PMTalk Product Manager Community

Dec 29, 2025 · Artificial Intelligence

Essential GPU Selection Tips for AI Model Training (Why Nvidia Dominates)

This guide explains how product managers can choose the right GPU and complementary hardware for AI model training, covering GPU memory, cores, architecture, budget, CPU role, RAM, storage, cooling, and other factors, with real‑world examples and practical trade‑offs.

AI hardwareDeep LearningGPU selection

0 likes · 9 min read

Essential GPU Selection Tips for AI Model Training (Why Nvidia Dominates)

Bighead's Algorithm Notes

Dec 25, 2025 · Artificial Intelligence

Paper Review: DeltaLag – An End‑to‑End Deep Learning Framework for Dynamically Learning Lead‑Lag Patterns in Financial Markets

DeltaLag introduces a sparse cross‑attention mechanism that dynamically discovers pair‑specific, time‑varying lead‑lag relationships in US equity markets and uses them to construct interpretable trading signals, achieving significantly higher annualized returns, Sharpe ratios, and information coefficients than fixed‑lag, statistical, and other spatio‑temporal deep learning baselines.

Deep LearningDeltaLagfinancial time series

0 likes · 13 min read

Paper Review: DeltaLag – An End‑to‑End Deep Learning Framework for Dynamically Learning Lead‑Lag Patterns in Financial Markets

Tencent Technical Engineering

Dec 24, 2025 · Artificial Intelligence

Build a Mini LLM from Scratch: Step‑by‑Step Guide to Tokenizer, Attention, and Transformer

This article walks through constructing a small large‑language model from the ground up, covering model architecture, tokenization methods, BPE vocabulary building, embedding, positional encoding, attention mechanisms, multi‑head attention, transformer blocks, training pipelines, inference, and sampling strategies, all with runnable Python code.

Deep LearningLLMPython

0 likes · 34 min read

Build a Mini LLM from Scratch: Step‑by‑Step Guide to Tokenizer, Attention, and Transformer

Data Party THU

Dec 20, 2025 · Artificial Intelligence

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

This guide walks you through 20 fundamental PyTorch concepts—including tensor creation, operations, autograd, model building, data loading, GPU acceleration, and best‑practice tricks—providing clear code snippets and step‑by‑step explanations so you can quickly prototype, train, and deploy neural networks.

Deep LearningGPU AccelerationModel Training

0 likes · 16 min read

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

Bighead's Algorithm Notes

Dec 19, 2025 · Artificial Intelligence

Quantitative Finance Paper Digest: Dec 13‑19 2025 Highlights

This digest presents recent arXiv papers (Dec 13‑19 2025) on AI‑driven quantitative finance, covering LLM‑based portfolio recommendation, reinforcement‑learning deep hedging, hybrid SV‑LSTM volatility forecasting, dynamic stacking ensembles, GA‑optimized SVR forecasting, and interpretable deep learning asset pricing, each with abstracts and key findings.

Deep LearningLLMQuantitative Finance

0 likes · 16 min read

Quantitative Finance Paper Digest: Dec 13‑19 2025 Highlights

Xiao Liu Lab

Dec 11, 2025 · Operations

Master SSH: From Basic Connections to Secure, High‑Performance Remote Workflows

This guide explains how SSH evolved from simple remote login to a comprehensive tool for secure server access, efficient command execution, password‑less authentication, advanced configuration, port forwarding for deep‑learning tasks, large‑file transfer strategies, and enterprise‑grade hardening, empowering developers and ops engineers to build reliable, reproducible workflows.

Deep LearningLinuxRemote Development

0 likes · 10 min read

Master SSH: From Basic Connections to Secure, High‑Performance Remote Workflows

Data STUDIO

Dec 9, 2025 · Artificial Intelligence

20 Core PyTorch Concepts to Accelerate Your AI Projects

This article walks through twenty essential PyTorch concepts—from basic Tensor creation and manipulation, through autograd and neural‑network construction, to data loading, GPU acceleration, model saving, and practical training tricks—providing concrete code examples and clear explanations for developers eager to build and deploy AI models.

AutogradDataLoaderDeep Learning

0 likes · 16 min read

20 Core PyTorch Concepts to Accelerate Your AI Projects

Tencent Cloud Developer

Dec 4, 2025 · Artificial Intelligence

From Tapestry to LLMs: 30+ Years of Recommender System Evolution

This article traces the three‑decade evolution of recommender systems—from early collaborative‑filtering prototypes like Tapestry, through the Netflix Prize era and deep‑learning breakthroughs such as Wide&Deep and DIN, to the current generative‑AI wave driven by large language models—highlighting key milestones, technical shifts, industrial deployments, and future challenges.

Deep LearningIndustrial Deploymentcollaborative filtering

0 likes · 38 min read

From Tapestry to LLMs: 30+ Years of Recommender System Evolution

AI Algorithm Path

Dec 1, 2025 · Artificial Intelligence

Getting Started with the Cutting‑Edge Vision‑Language Model Qwen3‑VL

This article introduces vision‑language models, explains why they outperform OCR‑plus‑LLM pipelines, and walks through practical OCR and information‑extraction tasks using Qwen3‑VL, complete with code snippets, example prompts, result analysis, and a discussion of the model's limitations and resource considerations.

Deep LearningInformation ExtractionOCR

0 likes · 13 min read

Getting Started with the Cutting‑Edge Vision‑Language Model Qwen3‑VL

Wuming AI

Nov 30, 2025 · Artificial Intelligence

What Exactly Is a Large Language Model? A Simple Guide to AI, Transformers, and How They Work

This article explains the relationship between AI, machine learning, deep learning, and large language models, detailing their evolution, training stages, transformer architecture, attention mechanisms, inference APIs, and practical usage examples, while demystifying common misconceptions about LLM capabilities.

AI fundamentalsDeep LearningRLHF

0 likes · 10 min read

What Exactly Is a Large Language Model? A Simple Guide to AI, Transformers, and How They Work

Kuaishou Tech

Nov 28, 2025 · Artificial Intelligence

Keye-VL-671B-A37B Leads Vision, Video, and Math Benchmarks

Kwai has open‑sourced its new flagship multimodal model Keye‑VL‑671B‑A37B, which upgrades visual perception, cross‑modal alignment and complex reasoning, achieving top scores on image, video, and mathematical reasoning benchmarks while detailing its architecture, three‑stage pre‑training, post‑training strategies, and future multimodal agent plans.

Deep Learninglarge language modelmultimodal

0 likes · 10 min read

Keye-VL-671B-A37B Leads Vision, Video, and Math Benchmarks

Bighead's Algorithm Notes

Nov 27, 2025 · Artificial Intelligence

IKNet: Explainable Stock Price Forecasting with News Keywords and Technical Indicators

IKNet combines FinBERT‑derived news keywords with technical‑indicator time series, uses SHAP to quantify each feature's impact, and achieves a 32.9% RMSE reduction and 18.5% higher cumulative returns on the S&P 500 (2015‑2024) compared with RNN and Transformer baselines, while providing fine‑grained, context‑aware explanations of price movements.

Deep LearningFinBERTSHAP

0 likes · 11 min read

IKNet: Explainable Stock Price Forecasting with News Keywords and Technical Indicators

Kuaishou Tech

Nov 25, 2025 · Artificial Intelligence

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Flow‑GRPO introduces online reinforcement learning into flow‑matching models by converting deterministic ODE sampling to stochastic SDE sampling and reducing denoising steps, raising SD‑3.5‑Medium's GenEval accuracy from 63% to 95%—surpassing GPT‑4o—and demonstrating strong gains in complex composition, text rendering, and human‑preference alignment across multiple generative tasks.

AI researchDeep Learningflow matching

0 likes · 8 min read

How Flow‑GRPO Boosts Image Generation Accuracy to 95% with Online Reinforcement Learning

Python Programming Learning Circle

Nov 18, 2025 · Artificial Intelligence

Top 10 Python Libraries Every Computer Vision Engineer Should Know

This article compiles the most commonly used Python libraries for computer vision, covering basic image handling with Pillow, high‑performance processing with OpenCV and Mahotas, advanced tools like Scikit‑Image, TensorFlow Image, PyTorch Vision, SimpleCV, Imageio, Albumentations, and the model zoo timm, each with concise descriptions and practical code snippets.

Deep LearningPyTorchTensorFlow

0 likes · 11 min read

Top 10 Python Libraries Every Computer Vision Engineer Should Know

IT Services Circle

Nov 10, 2025 · Artificial Intelligence

Why PyTorch Co‑Founder Soumith Chintala Is Leaving Meta After 11 Years

Soumith Chintala, one of PyTorch’s original creators, announced his departure from Meta after eleven years, citing a desire to move beyond the framework, reflecting on his pivotal role in building PyTorch, its global impact, and his gratitude to the community while looking ahead to new challenges.

AIDeep LearningMeta

0 likes · 12 min read

Why PyTorch Co‑Founder Soumith Chintala Is Leaving Meta After 11 Years

Bighead's Algorithm Notes

Nov 8, 2025 · Artificial Intelligence

Time-Series Paper Digest: Nov 1‑7 2025 Highlights

This digest summarizes three recent AI papers—DoFlow, Forecast2Anomaly, and ForecastGAN—detailing their causal generative flow model for interventions, a retrieval‑augmented framework for zero‑shot anomaly prediction, and a decomposition‑based adversarial approach that improves multi‑horizon forecasting across diverse datasets.

Deep LearningTime Seriesanomaly detection

0 likes · 8 min read

Time-Series Paper Digest: Nov 1‑7 2025 Highlights

HyperAI Super Neural

Nov 7, 2025 · Artificial Intelligence

How PLACER Tackles Atomic‑Level Modeling of Protein Conformational Heterogeneity

The PLACER graph‑neural‑network framework from David Baker’s lab generates atom‑accurate small‑molecule structures and protein‑ligand conformational ensembles, trained on large CSD and PDB datasets, achieving sub‑Å precision, outperforming traditional docking in many benchmarks and markedly improving enzyme‑design success rates.

Deep LearningGraph Neural NetworkPLACER

0 likes · 15 min read

How PLACER Tackles Atomic‑Level Modeling of Protein Conformational Heterogeneity

Bighead's Algorithm Notes

Nov 4, 2025 · Artificial Intelligence

Key Quantitative Finance Papers from WWW2025 – Summaries & Insights

This article compiles concise English summaries of recent AI-driven quantitative finance papers presented at WWW2025, covering novel stock‑price forecasting frameworks such as CSPO, MERA, Ploutos, DINS, HedgeAgents, HRFT, and IDED, with links to the original PDFs, code repositories, authors, and abstracts.

Deep LearningFinancial AIQuantitative Finance

0 likes · 13 min read

Key Quantitative Finance Papers from WWW2025 – Summaries & Insights

JD Tech Talk

Nov 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try-On Transforms Fashion E‑Commerce

The article explains how JD.com's AI virtual try‑on system Oxygen Tryon uses advanced computer‑vision and generative models to let shoppers instantly preview clothing on their own photos, dramatically improving purchase decisions, reducing return rates, and outlining technical challenges, innovations, and future development plans.

AIComputer VisionDeep Learning

0 likes · 7 min read

How AI-Powered Virtual Try-On Transforms Fashion E‑Commerce

Radish, Keep Going!

Nov 4, 2025 · Artificial Intelligence

What You Need to Know: Backpropagation, FreeBSD, AI MoE, and More Tech Insights

This roundup covers essential insights on backpropagation fundamentals, FreeBSD self‑hosting benefits, an open‑source 30B MoE AI model, misuse of cybercrime laws, historic moving sidewalks, party‑planning hacks, deceptive signal‑strength tricks, a 1000‑hp micro motor, Nextcloud performance fixes, and Google Cloud account suspensions, offering a blend of technical depth and practical advice.

AIBackpropagationDeep Learning

0 likes · 11 min read

What You Need to Know: Backpropagation, FreeBSD, AI MoE, and More Tech Insights

Tencent Cloud Developer

Nov 4, 2025 · Artificial Intelligence

From Functions to Transformers: Mastering Neural Networks Step by Step

This article walks you through the evolution from basic mathematical functions to modern large‑scale models, explaining activation functions, forward and backward propagation, loss calculation, gradient descent, regularization, dropout, word embeddings, RNNs, and the core mechanics of the Transformer architecture.

Attention MechanismDeep LearningNeural Networks

0 likes · 15 min read

From Functions to Transformers: Mastering Neural Networks Step by Step

Data Party THU

Nov 2, 2025 · Artificial Intelligence

From RNN to LLM: How Transformers Power Modern Language Models

This article explains the evolution from RNNs through Encoder‑Decoder models to Transformers, detailing self‑attention, multi‑head attention, and masked attention, and then describes what Large Language Models are, their key components, capabilities, limitations, and common applications.

AIDeep LearningLLM

0 likes · 9 min read

From RNN to LLM: How Transformers Power Modern Language Models

HyperAI Super Neural

Oct 30, 2025 · Artificial Intelligence

OmniCast Achieves 20× Speed Boost and Eliminates Autoregressive Error Accumulation in S2S Weather Forecasting

OmniCast, a novel latent diffusion model from UCLA and Argonne Lab, combines VAE and Transformer to generate high‑precision probabilistic sub‑seasonal to seasonal forecasts, dramatically reducing error accumulation of autoregressive methods and delivering 10‑20× faster inference while surpassing state‑of‑the‑art baselines across accuracy, physical consistency, and probabilistic metrics.

Deep LearningLatent DiffusionOmniCast

0 likes · 15 min read

OmniCast Achieves 20× Speed Boost and Eliminates Autoregressive Error Accumulation in S2S Weather Forecasting

Python Programming Learning Circle

Oct 28, 2025 · Artificial Intelligence

Why Nvidia Is Making Python a First‑Class Citizen in CUDA

Nvidia announced native Python support for its CUDA toolkit, detailing new Python‑centric APIs, projects like CuTile and Cutlass, and a layered strategy that democratizes GPU programming for AI developers while preserving performance and expanding the ecosystem.

AICUDADeep Learning

0 likes · 10 min read

Why Nvidia Is Making Python a First‑Class Citizen in CUDA

Data Party THU

Oct 28, 2025 · Artificial Intelligence

How AI is Reviving Dunhuang Murals: From 3D Scans to Digital Restoration

This article examines the cutting‑edge AI techniques—multimodal fusion, deep‑learning disease detection, reversible repair, diffusion‑Transformer models, GAN‑based pattern generation, and AR navigation—that enable millimetre‑level digital restoration and cultural democratization of the Dunhuang murals.

AIARCultural Heritage

0 likes · 14 min read

How AI is Reviving Dunhuang Murals: From 3D Scans to Digital Restoration

DataFunSummit

Oct 25, 2025 · Artificial Intelligence

How AIGC Is Revolutionizing Image Generation and Editing

This article explores how generative AI (AIGC) is transforming image creation and editing by addressing traditional pain points, detailing core concepts, key technical modules, controllable generation and editing techniques, representative research breakthroughs, business applications, and future challenges and opportunities.

AI ethicsAIGCDeep Learning

0 likes · 20 min read

How AIGC Is Revolutionizing Image Generation and Editing

HyperAI Super Neural

Oct 21, 2025 · Artificial Intelligence

BindCraft Enables Direct AlphaFold2‑Driven Intelligent Protein Binder Design (46% Success on 12 Targets)

BindCraft, an open‑source pipeline from EPFL and MIT, uses AlphaFold2 gradient back‑propagation to design protein binders without manual scaffolding, achieving an average 46.3% success rate across 12 challenging targets and offering a one‑click tutorial for rapid experimentation.

AlphaFold2BindCraftDeep Learning

0 likes · 5 min read

BindCraft Enables Direct AlphaFold2‑Driven Intelligent Protein Binder Design (46% Success on 12 Targets)

Bighead's Algorithm Notes

Oct 18, 2025 · Artificial Intelligence

Time Series Paper Digest (Oct 11‑17 2025): FIRE, CauchyNet, EvoRate, CoRA

From Oct 11‑17 2025, this digest presents four recent AI papers on time‑series forecasting: FIRE introduces a frequency‑domain decomposition with independent amplitude‑phase modeling and adaptive weighting; CauchyNet leverages holomorphic activations for compact, data‑efficient learning; the EvoRate framework quantifies learnability via mutual information; and CoRA adds covariate‑aware adaptation to foundation models, all reporting significant accuracy gains and enhanced interpretability.

AI researchDeep Learningcovariate-aware adaptation

0 likes · 10 min read

Time Series Paper Digest (Oct 11‑17 2025): FIRE, CauchyNet, EvoRate, CoRA

Bighead's Algorithm Notes

Oct 11, 2025 · Artificial Intelligence

Recent Advances in Multivariate Time Series Forecasting: Paper Summaries (Sep 27 – Oct 10 2025)

This article summarizes eight newly released AI papers on multivariate time‑series forecasting and anomaly detection, detailing each work's motivation, proposed methodology, key innovations such as CRIB, TS‑JEPA, DSAT‑HD, DIMIGNN, ASTGI, IndexNet, TsLLM, Moon, TimeSeriesScientist, MLG‑4TS, and Augur, and reports their experimental validation on real‑world datasets.

Deep LearningTransformeranomaly detection

0 likes · 23 min read

Recent Advances in Multivariate Time Series Forecasting: Paper Summaries (Sep 27 – Oct 10 2025)

Bighead's Algorithm Notes

Oct 10, 2025 · Artificial Intelligence

Quantitative Finance Paper Digest (Sep 27 – Oct 10 2025)

This digest summarizes recent arXiv papers that introduce new AI‑driven methods for portfolio similarity, Bayesian portfolio optimization, end‑to‑end deep‑learning portfolio construction, large‑language‑model‑based financial prediction, and multi‑agent crypto‑trading systems, highlighting their datasets, architectures, and empirical gains.

Bayesian OptimizationDeep Learningasset allocation

0 likes · 18 min read

Quantitative Finance Paper Digest (Sep 27 – Oct 10 2025)

Data Party THU

Oct 5, 2025 · Artificial Intelligence

How ImageDDI Boosts Drug‑Drug Interaction Prediction with Motif Sequences and Molecular Images

The ImageDDI framework, introduced by a team from Hunan University, combines molecular motif sequences with 2D/3D molecular images using a Transformer encoder and adaptive feature fusion, achieving significantly higher accuracy and macro‑F1 scores than existing methods on multiple DDI datasets, while also providing interpretable visual explanations.

Deep LearningDrug InteractionImage Fusion

0 likes · 10 min read

How ImageDDI Boosts Drug‑Drug Interaction Prediction with Motif Sequences and Molecular Images

Data Party THU

Oct 4, 2025 · Artificial Intelligence

Unveiling Transformer Internals: From Theory to PyTorch Code

This article deeply explores the Transformer architecture by combining original paper principles with PyTorch source code, covering encoder‑decoder design, positional encoding assumptions, core parameters, residual connections, attention mechanisms, and detailed implementation snippets to help readers understand and reproduce the model.

Deep LearningNeural NetworksPositional Encoding

0 likes · 22 min read

Unveiling Transformer Internals: From Theory to PyTorch Code

Mashang Consumer UXC

Sep 29, 2025 · Artificial Intelligence

Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More

This article reviews the latest open‑source AI models released by major tech firms—including Tencent's 3D‑Omni and 3D‑Part, Shengshu Tech's Vidu Q2 for facial video, Nvidia's Audio2Face for real‑time facial animation, plus updates from Figma, Google, Alibaba and Kuaishou—highlighting their capabilities and potential applications in gaming, AR/VR, design and content creation.

3D ModelingAIDeep Learning

0 likes · 8 min read

Open-Source AI 3D, Video & Audio Models: Tencent, Vidu, Audio2Face and More

IT Services Circle

Sep 27, 2025 · Artificial Intelligence

Why “Neural Network” and “Deep Learning” Are Actually the Same Thing

The article explains how the terms “neural network” and “deep learning” originated, why they were once treated as distinct branches of AI, and how historical biases and naming politics eventually merged them into a single research direction.

AI historyDeep LearningNeural Networks

0 likes · 4 min read

Why “Neural Network” and “Deep Learning” Are Actually the Same Thing

Bighead's Algorithm Notes

Sep 25, 2025 · Artificial Intelligence

How MARS Uses Risk‑Aware Multi‑Agent RL to Master Portfolio Management

This article reviews the MARS framework, a risk‑aware multi‑agent reinforcement‑learning system for automated portfolio management that tackles market non‑stationarity and proactive risk control, detailing its hierarchical architecture, formal MDP formulation, training process, and superior experimental results on DJIA and HSI benchmarks.

Deep LearningMulti-AgentPortfolio Management

0 likes · 13 min read

How MARS Uses Risk‑Aware Multi‑Agent RL to Master Portfolio Management

Wu Shixiong's Large Model Academy

Sep 25, 2025 · Artificial Intelligence

Master Self-Attention & Multi-Head Attention for Large Model Interviews

This guide breaks down the core logic, computation steps, formulas, and common interview questions about Self‑Attention and Multi‑Head Attention in Transformers, offering concrete explanations, dimensional examples, and practical answering techniques to help candidates ace large‑model algorithm interviews.

Deep LearningInterview TipsSelf-Attention

0 likes · 8 min read

Master Self-Attention & Multi-Head Attention for Large Model Interviews

AIWalker

Sep 24, 2025 · Artificial Intelligence

Top 2025 Object Detection Research Paths: From Grounding DINO 1.5 to Open‑Set Breakthroughs

The article outlines four key innovation avenues—architecture redesign, task expansion, information fusion, and paradigm shift—highlighting recent works such as Mr. DETR, Grounding DINO 1.5, SM3Det, and RoboFusion, and offers a curated list of 176 cutting‑edge object‑detection papers with code and datasets for free.

Deep LearningModel architectureobject detection

0 likes · 8 min read

Top 2025 Object Detection Research Paths: From Grounding DINO 1.5 to Open‑Set Breakthroughs

Data Party THU

Sep 24, 2025 · Artificial Intelligence

What’s New in Stanford’s CS231n 2025: Full Course Materials and Syllabus

Stanford’s CS231n Spring 2025 course, led by Fei‑Fei Li and a team of leading AI researchers, is now fully available online with video lectures, detailed syllabus, instructor bios, and prerequisite guidelines, offering a comprehensive deep‑learning curriculum for computer‑vision enthusiasts.

CS231nCourseDeep Learning

0 likes · 5 min read

What’s New in Stanford’s CS231n 2025: Full Course Materials and Syllabus

Data Party THU

Sep 20, 2025 · Artificial Intelligence

How Mamba-Adaptor Revives State‑Space Models for Vision Tasks

The Mamba-Adaptor introduces a dual‑module adapter that overcomes causal computation limits, long‑range memory decay, and spatial structure loss in state‑space models, delivering state‑of‑the‑art results on ImageNet, COCO, and various downstream visual tasks with minimal overhead.

AdapterCOCODeep Learning

0 likes · 8 min read

How Mamba-Adaptor Revives State‑Space Models for Vision Tasks

AIWalker

Sep 17, 2025 · Artificial Intelligence

Cutting-Edge Attention Mechanism Innovations for 2025: Modal Fusion and Domain Adaptation

This article surveys 183 recent attention‑mechanism papers, classifies them into four innovation categories, and highlights representative works such as MILA, ARFFT, CNN‑Transformer for speech emotion, and LSTM‑attention epidemic forecasting, providing concrete methods, code links, and performance insights.

2025Attention MechanismDeep Learning

0 likes · 7 min read

Cutting-Edge Attention Mechanism Innovations for 2025: Modal Fusion and Domain Adaptation

Architect

Sep 16, 2025 · Artificial Intelligence

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

This article introduces the Transformer architecture, explaining its attention mechanism, encoder‑decoder design, training and inference processes, and why it surpasses RNN‑based models, while also covering common applications and variations in natural language processing.

Deep LearningModel architectureNLP

0 likes · 13 min read

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

DataFunTalk

Sep 14, 2025 · Artificial Intelligence

Why Modern LLMs Skip Thinking: Token Routing and Zero‑Compute Experts Explained

The article examines how large language models now use routing mechanisms and token‑level expert selection to reduce computation and cost, illustrating the trade‑offs with real‑world examples from OpenAI, LongCat, and DeepSeek while highlighting both the benefits and the pitfalls of this approach.

AIDeep LearningToken efficiency

0 likes · 8 min read

Why Modern LLMs Skip Thinking: Token Routing and Zero‑Compute Experts Explained

Data Party THU

Sep 13, 2025 · Artificial Intelligence

How AI is Revolutionizing Quantum System Modeling: A Comprehensive Review

This review surveys how artificial intelligence—through machine learning, deep learning, and large language models—enables researchers to characterize, predict, and reconstruct complex quantum systems, outlines a unified learning framework, discusses current breakthroughs and challenges, and envisions a future "quantum GPT" that could transform quantum science.

AIDeep LearningQuantum Physics

0 likes · 10 min read

How AI is Revolutionizing Quantum System Modeling: A Comprehensive Review

AI Frontier Lectures

Sep 9, 2025 · Artificial Intelligence

Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?

The paper introduces UniConvNet, a novel convolutional architecture that expands the effective receptive field (ERF) of ConvNets without breaking the asymptotically Gaussian distribution (AGD), achieving superior accuracy‑parameter and accuracy‑FLOPs trade‑offs across image classification, detection, and segmentation benchmarks.

Deep LearningEffective Receptive FieldImage Classification

0 likes · 9 min read

Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?

AI Frontier Lectures

Sep 7, 2025 · Artificial Intelligence

How Dynamic Snake and Pinwheel Convolutions Boost Small‑Target Segmentation Accuracy

This article reviews two recent AI papers—Dynamic Snake Convolution with topological constraints for tubular structure segmentation and Pinwheel‑shaped Convolution with scale‑based dynamic loss for infrared small‑target detection—detailing their methods, innovations, experimental gains, and future research directions.

Deep Learningdynamic convolutionmedical imaging

0 likes · 7 min read

How Dynamic Snake and Pinwheel Convolutions Boost Small‑Target Segmentation Accuracy

Architects' Tech Alliance

Sep 7, 2025 · Artificial Intelligence

How Huawei’s Ascend 910D Stacks Up Against Global AI Chip Rivals

Huawei’s Ascend 910D AI chip boasts a revamped architecture, 320 TFLOPS half‑precision performance, liquid‑cooling with only 350 W power, and 4 TB/s inter‑chip bandwidth, and the article compares these advantages to previous 910 models, domestic competitors and leading foreign chips such as Nvidia H100, highlighting performance, cost and ecosystem benefits.

AI ChipAscend 910DDeep Learning

0 likes · 15 min read

How Huawei’s Ascend 910D Stacks Up Against Global AI Chip Rivals

Bighead's Algorithm Notes

Sep 3, 2025 · Artificial Intelligence

Decoding TINs: Reconstructing Classic Technical Analysis with Neural Networks

The paper introduces Technical Indicator Networks (TINs), a framework that maps traditional technical analysis formulas to neural‑network topologies, initializes weights to preserve indicator behavior, and uses reinforcement learning for dynamic optimization, achieving significantly higher Sharpe, Sortino, and cumulative returns on US30 component stocks than conventional MACD approaches.

Algorithmic TradingDeep LearningFinancial AI

0 likes · 9 min read

Decoding TINs: Reconstructing Classic Technical Analysis with Neural Networks

Network Intelligence Research Center (NIRC)

Sep 3, 2025 · Artificial Intelligence

Understanding AI Compilers: A TVM Example

The article explains how AI compilers transform high‑level models into efficient hardware code, using TVM to illustrate operator optimization, automated scheduling, and end‑to‑end compilation workflow with concrete code examples and performance considerations.

AI compilerDeep LearningTVM

0 likes · 8 min read

Understanding AI Compilers: A TVM Example

Data Party THU

Sep 2, 2025 · Artificial Intelligence

Gradient-Based Multi-Objective Deep Learning: Theory, Algorithms, and LLM Applications

This tutorial provides a systematic overview of gradient‑based multi‑objective optimization for deep learning, covering core solution strategies, algorithmic details, convergence and generalization analyses, and demonstrates how these methods can be applied to fine‑tune and align large language models.

Deep LearningGradient MethodsLLM fine-tuning

0 likes · 3 min read

Gradient-Based Multi-Objective Deep Learning: Theory, Algorithms, and LLM Applications

Data STUDIO

Sep 2, 2025 · Artificial Intelligence

Understanding NAS: Core Algorithms and Python Implementations

This article reviews Neural Architecture Search (NAS), explains its bi‑level optimization formulation, compares three major search strategies—reinforcement learning, evolutionary algorithms, and differentiable gradient‑based methods—provides complete Python code for each, and analyzes experimental results highlighting performance trade‑offs and remaining challenges.

Deep LearningDifferentiable Architecture SearchEvolutionary Algorithms

0 likes · 25 min read

Understanding NAS: Core Algorithms and Python Implementations