Tagged articles

Matrix multiplication

14 articles · Page 1 of 1

Jan 21, 2026 · Artificial Intelligence

When Go Meets GPU: A Hands‑On Guide to Unlocking Thousand‑Fold Compute with CUDA

This article walks Go developers through the fundamentals of GPU architecture and CUDA, demonstrates a complete CGO‑based matrix‑multiplication project, offers performance‑tuning tips such as minimizing PCIe transfers and leveraging shared memory, and presents a PureGo alternative for seamless Go‑GPU integration.

CGOCUDAGPU computing

0 likes · 17 min read

When Go Meets GPU: A Hands‑On Guide to Unlocking Thousand‑Fold Compute with CUDA

Architects' Tech Alliance

Sep 13, 2025 · Artificial Intelligence

How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810

Huawei’s Da Vinci AI architecture, introduced with the Kirin 810 SoC, combines a 3D Cube matrix‑multiply engine, vector and scalar units, and flexible scaling to deliver high‑performance, energy‑efficient AI compute across devices from low‑power IoT to high‑end cloud servers.

3D CubeAIDa Vinci architecture

0 likes · 11 min read

How Huawei’s Da Vinci Architecture Powers Next‑Gen AI on the Kirin 810

Python Programming Learning Circle

Jun 11, 2025 · Fundamentals

Master Python’s @ Operator: Matrix Multiplication Made Simple

This article explains Python's @ operator for matrix multiplication, shows basic usage with NumPy, contrasts it with element‑wise *, demonstrates matrix‑vector multiplication, highlights common dimension‑mismatch errors, and provides a concise summary for efficient linear‑algebra calculations.

Matrix multiplicationNumPyPython

0 likes · 4 min read

Master Python’s @ Operator: Matrix Multiplication Made Simple

Network Intelligence Research Center (NIRC)

Jun 9, 2025 · Artificial Intelligence

How to Build High‑Performance GEMM with NVIDIA CUTLASS

The article explains why standard GEMM libraries may fall short for special matrix shapes, introduces NVIDIA’s open‑source CUTLASS library, details its hierarchical tiling architecture, and walks through a complete device‑API example that customizes tile sizes and data layouts to achieve near‑hand‑written kernel performance on modern GPUs.

CUDACutlassGEMM

0 likes · 6 min read

How to Build High‑Performance GEMM with NVIDIA CUTLASS

DaTaobao Tech

Oct 16, 2024 · Artificial Intelligence

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.

CPU optimizationDynamic QuantizationINT8

0 likes · 19 min read

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

Ops Development & AI Practice

May 16, 2024 · Fundamentals

Boost Go Performance: Harness CPU Cache Locality with Practical Tips

This article explains the CPU cache locality principle, shows how to restructure Go data access patterns—including data structures, field ordering, memory allocation, and false sharing avoidance—and demonstrates measurable performance gains with a matrix‑multiplication benchmark.

CPU cacheGoMatrix multiplication

0 likes · 10 min read

Boost Go Performance: Harness CPU Cache Locality with Practical Tips

21CTO

Aug 29, 2023 · Artificial Intelligence

Mojo vs Python: Does the New AI Language Really Deliver 36,000× Speedup?

The article examines Modular's new Mojo language, its claim of massive performance gains over Python for AI workloads, presents benchmark code and results, discusses its origins, investment interest, and current beta status, concluding that while impressive, the 36,000× claim is overstated.

AI programmingMatrix multiplicationMojo

0 likes · 8 min read

Mojo vs Python: Does the New AI Language Really Deliver 36,000× Speedup?

Alibaba Cloud Developer

Mar 7, 2023 · Backend Development

Why Loop Order Matters: Boost Java Matrix Multiplication Speed by 100×

This article demonstrates how reorganizing Java matrix‑multiplication loops and understanding Java 2‑D array storage and CPU cache hierarchies can turn a naïve implementation into a version that runs up to a hundred times faster, backed by JMH benchmark results.

CPU cacheMatrix multiplicationbenchmark

0 likes · 13 min read

Why Loop Order Matters: Boost Java Matrix Multiplication Speed by 100×

DaTaobao Tech

Nov 18, 2022 · Artificial Intelligence

ARMv86 Instruction Set Optimization for MNN: Accelerating Int8 and BF16 Matrix Multiplication

The article explains how ARMv86’s new SMMLA and BFMMLA GEMM instructions are integrated into MNN to accelerate INT8 and BF16 matrix multiplication, delivering up to 90% speedup over ARMv82’s SDOT and FP16‑FMLA kernels through optimized kernels, tiling, and compatibility handling.

ARMv86MNNMatrix multiplication

0 likes · 15 min read

ARMv86 Instruction Set Optimization for MNN: Accelerating Int8 and BF16 Matrix Multiplication

Sohu Tech Products

Oct 12, 2022 · Artificial Intelligence

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

DeepMind’s AlphaTensor, built on AlphaZero and reinforcement learning, automatically discovers novel, provably correct matrix multiplication algorithms that outperform classic methods like Strassen’s, demonstrating how modern AI can automate algorithm discovery and significantly accelerate computations across many fields.

AIAlphaTensorDeepMind

0 likes · 8 min read

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

21CTO

Nov 29, 2020 · Artificial Intelligence

Decode Math Symbols with Python: From Summation to Matrix Multiplication

Learn how to translate common mathematical symbols such as summation, product, factorial, conditional expressions, and matrix multiplication into clear Python code, revealing the underlying computations and helping data scientists and ML practitioners deepen their mathematical intuition through practical examples.

Code examplesMathematicsMatrix multiplication

0 likes · 7 min read

Decode Math Symbols with Python: From Summation to Matrix Multiplication

Tech Musings

May 18, 2020 · Fundamentals

Master Multi‑Round Interview Coding: Parsing, Concurrency, and Matrix Multiplication

This article walks through three typical interview coding rounds—implementing a robust string‑to‑int parser, designing a high‑concurrency trace‑storage API, and building a matrix multiplication routine with performance and sparse‑matrix optimizations—providing code examples, corner‑case handling, and improvement ideas.

Matrix multiplicationString Parsingalgorithm optimization

0 likes · 7 min read

Master Multi‑Round Interview Coding: Parsing, Concurrency, and Matrix Multiplication

ITPUB

Mar 25, 2017 · Fundamentals

Why 5×3 ≠ 5+5+5: Understanding Equality vs Equivalence in Math and Code

The article explores why mathematically equal expressions like 5×3 and 5+5+5 are not necessarily equivalent, explains the distinction between equality and equivalence, illustrates with real‑world and programming examples such as JavaScript’s == versus === and matrix multiplication rules, and discusses teaching implications.

JavaScriptMathematicsMatrix multiplication

0 likes · 6 min read

Why 5×3 ≠ 5+5+5: Understanding Equality vs Equivalence in Math and Code

Liulishuo Tech Team

Sep 3, 2016 · Artificial Intelligence

Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices

This article describes how the English fluency app leverages deep neural network (DNN) models for real‑time speech scoring on smartphones, detailing offline inference challenges, BLAS‑based matrix‑vector optimizations, sparsity exploitation, cache‑friendly implementations, fixed‑point and NEON acceleration, as well as model compression techniques to improve accuracy and latency.

BLASDNN optimizationMatrix multiplication

0 likes · 11 min read

Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices