Tagged articles
13 articles
Page 1 of 1
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACUTLASSGEMM
0 likes · 6 min read
How to Build High‑Performance GEMM with NVIDIA CUTLASS
DaTaobao Tech
DaTaobao Tech
Oct 16, 2024 · Artificial Intelligence

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.

CPU optimizationDynamic QuantizationINT8
0 likes · 19 min read
Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend
21CTO
21CTO
Aug 29, 2023 · Artificial Intelligence

Mojo vs Python: Does the New AI Language Really Deliver 36,000× Speedup?

The article examines Modular's new Mojo language, its claim of massive performance gains over Python for AI workloads, presents benchmark code and results, discusses its origins, investment interest, and current beta status, concluding that while impressive, the 36,000× claim is overstated.

AI programmingMatrix MultiplicationMojo
0 likes · 8 min read
Mojo vs Python: Does the New AI Language Really Deliver 36,000× Speedup?
Sohu Tech Products
Sohu Tech Products
Oct 12, 2022 · Artificial Intelligence

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

DeepMind’s AlphaTensor, built on AlphaZero and reinforcement learning, automatically discovers novel, provably correct matrix multiplication algorithms that outperform classic methods like Strassen’s, demonstrating how modern AI can automate algorithm discovery and significantly accelerate computations across many fields.

AIAlphaTensorDeepMind
0 likes · 8 min read
AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms
21CTO
21CTO
Nov 29, 2020 · Artificial Intelligence

Decode Math Symbols with Python: From Summation to Matrix Multiplication

Learn how to translate common mathematical symbols such as summation, product, factorial, conditional expressions, and matrix multiplication into clear Python code, revealing the underlying computations and helping data scientists and ML practitioners deepen their mathematical intuition through practical examples.

Code ExamplesData ScienceMatrix Multiplication
0 likes · 7 min read
Decode Math Symbols with Python: From Summation to Matrix Multiplication
Tech Musings
Tech Musings
May 18, 2020 · Fundamentals

Master Multi‑Round Interview Coding: Parsing, Concurrency, and Matrix Multiplication

This article walks through three typical interview coding rounds—implementing a robust string‑to‑int parser, designing a high‑concurrency trace‑storage API, and building a matrix multiplication routine with performance and sparse‑matrix optimizations—providing code examples, corner‑case handling, and improvement ideas.

Algorithm OptimizationMatrix MultiplicationString Parsing
0 likes · 7 min read
Master Multi‑Round Interview Coding: Parsing, Concurrency, and Matrix Multiplication
ITPUB
ITPUB
Mar 25, 2017 · Fundamentals

Why 5×3 ≠ 5+5+5: Understanding Equality vs Equivalence in Math and Code

The article explores why mathematically equal expressions like 5×3 and 5+5+5 are not necessarily equivalent, explains the distinction between equality and equivalence, illustrates with real‑world and programming examples such as JavaScript’s == versus === and matrix multiplication rules, and discusses teaching implications.

JavaScriptMatrix Multiplicationeducation
0 likes · 6 min read
Why 5×3 ≠ 5+5+5: Understanding Equality vs Equivalence in Math and Code
Liulishuo Tech Team
Liulishuo Tech Team
Sep 3, 2016 · Artificial Intelligence

Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices

This article describes how the English fluency app leverages deep neural network (DNN) models for real‑time speech scoring on smartphones, detailing offline inference challenges, BLAS‑based matrix‑vector optimizations, sparsity exploitation, cache‑friendly implementations, fixed‑point and NEON acceleration, as well as model compression techniques to improve accuracy and latency.

BLASDNN optimizationMatrix Multiplication
0 likes · 11 min read
Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices