Tagged articles
42 articles
Page 1 of 1
21CTO
21CTO
Mar 31, 2026 · Fundamentals

What’s New in C++26? A Deep Dive into Safety, Performance, and Usability

The C++26 standard, now frozen and slated for release in 2026, introduces static reflection, contracts, Safe C++ configurations, flat containers, SIMD and async models, plus an #embed directive, dramatically enhancing safety, performance, and developer ergonomics for modern system programming.

C++26ContractsSIMD
0 likes · 4 min read
What’s New in C++26? A Deep Dive into Safety, Performance, and Usability
Tech Musings
Tech Musings
Jan 16, 2026 · Backend Development

Unlock Go’s New SIMD API: Boost Performance with GOEXPERIMENT=simd

This article explains the motivation behind adding SIMD support to Go, describes the two‑level design of the experimental simd/archsimd package, provides step‑by‑step configuration and code examples for common data‑processing tasks, and presents benchmark results that show up to nearly nine‑fold speedups without extra memory allocations.

GOEXPERIMENTGoSIMD
0 likes · 17 min read
Unlock Go’s New SIMD API: Boost Performance with GOEXPERIMENT=simd
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 4, 2026 · Databases

Accelerating AliSQL Vector Search with Nodes Cache and SIMD

AliSQL 8.0 introduces a shared Nodes Cache and per‑transaction cache to speed up vector queries, implements RC‑level transaction isolation for read‑only and read‑write operations, and leverages SIMD‑based pre‑computation to dramatically improve high‑dimensional vector distance calculations and concurrency performance.

AliSQLSIMDcache optimization
0 likes · 9 min read
Accelerating AliSQL Vector Search with Nodes Cache and SIMD
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Sep 21, 2025 · Frontend Development

WebAssembly 3.0: The Emerging Fourth Language for Front‑End Development

The article explains how WebAssembly, now in its 3.0 release, adds a fourth language to front‑end development by offering multi‑language support, near‑native performance, and new features such as 64‑bit memory, garbage collection, and tighter JavaScript integration for compute‑intensive web applications.

Front-endGarbage CollectionMulti-language
0 likes · 6 min read
WebAssembly 3.0: The Emerging Fourth Language for Front‑End Development
DataFunTalk
DataFunTalk
Sep 9, 2025 · Big Data

How Auron’s Vectorized Engine Doubles Big Data Performance Over Spark

The Auron project, a native vectorized execution engine donated by Kuaishou and now incubated by the Apache Software Foundation, leverages Rust and SIMD to cut resource overhead, achieve over‑two‑fold speedups on TPC‑DS benchmarks, and integrates seamlessly with Spark and other big‑data ecosystems.

Apache IncubatorAuronRust
0 likes · 6 min read
How Auron’s Vectorized Engine Doubles Big Data Performance Over Spark
Deepin Linux
Deepin Linux
Jul 24, 2025 · Fundamentals

Boosting Large-Scale std::vector Performance: Memory, Moves, and SIMD

This article examines why std::vector can become a bottleneck when handling millions of elements, analyzes memory consumption, insertion/deletion costs, and cache behavior, and presents practical optimizations such as pre‑allocation, move semantics, SIMD vectorization, and cache‑friendly designs illustrated with real‑world case studies and code examples.

Memory OptimizationSIMDc++
0 likes · 21 min read
Boosting Large-Scale std::vector Performance: Memory, Moves, and SIMD
Java Architecture Diary
Java Architecture Diary
Mar 21, 2025 · Backend Development

Boost Java Performance with the New Vector API: SIMD Made Simple

This article introduces Java’s emerging Vector API, explains its SIMD‑based design, provides practical code examples for array addition, dot product, and complex calculations, and details performance benchmarks, integration with vector databases, usage considerations, and future development prospects.

SIMDbackend-developmentperformance optimization
0 likes · 10 min read
Boost Java Performance with the New Vector API: SIMD Made Simple
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Feb 17, 2025 · Backend Development

Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets

The article explains how Go 1.24’s new Swiss Tables hash‑map implementation replaces the traditional bucket‑based design with open addressing, SIMD‑accelerated probing, and metadata separation, detailing the underlying principles, performance advantages, handling of clustering and deletions, and a comparison with previous Go maps and Java’s HashMap.

GoSIMDhash map
0 likes · 16 min read
Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets
21CTO
21CTO
Feb 9, 2025 · Backend Development

How TikTok’s Sonic Library Supercharges Go JSON Performance

This article explains how TikTok engineers built Sonic, a high‑performance Go JSON library that leverages JIT compilation, SIMD instructions, smart memory handling, and optional features to dramatically reduce latency and memory usage compared with the standard encoding/json package, offering real‑world cost and speed benefits.

GoJITJSON
0 likes · 9 min read
How TikTok’s Sonic Library Supercharges Go JSON Performance
BirdNest Tech Talk
BirdNest Tech Talk
Feb 1, 2025 · Fundamentals

Can Go Harness SIMD for High‑Performance Computing? A Deep Dive

This article examines SIMD (Single Instruction Multiple Data) technology, its relevance to Go’s performance goals, the challenges of integrating SIMD into Go’s design, current standard‑library limitations, third‑party libraries, compiler support, and practical assembly examples, concluding with prospects for future Go SIMD adoption.

AssemblyGoSIMD
0 likes · 15 min read
Can Go Harness SIMD for High‑Performance Computing? A Deep Dive
AntData
AntData
Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

FlinkSIMDVelox
0 likes · 17 min read
Flex: A Stream‑Batch Integrated Vectorized Engine for Flink
Java Tech Enthusiast
Java Tech Enthusiast
Jun 7, 2024 · Fundamentals

Engineer Builds GPU from Scratch in Two Weeks

In just two weeks, engineer Adam Majmudar designed and implemented a minimalist GPU called tiny‑gpu—complete with a custom 11‑instruction ISA, Verilog RTL, and verified via OpenLane—sharing the open‑source project on GitHub, earning thousands of stars, and preparing it for fabrication through Tiny Tapeout 7, showcasing how modern tools make DIY chip design increasingly accessible.

Chip DesignEDAGPU
0 likes · 8 min read
Engineer Builds GPU from Scratch in Two Weeks
21CTO
21CTO
Apr 15, 2024 · Artificial Intelligence

Why Mojo’s Open‑Source Release Could Redefine AI Programming

Modular Inc. announced the open‑source release of Mojo’s core standard library, highlighting its Python‑like syntax, MLIR‑based compiler, SIMD‑first design, eager destruction, and performance claims of being tens of thousands of times faster than Python, positioning it as a potential future‑dominant AI language.

AI programmingMLIRMojo
0 likes · 14 min read
Why Mojo’s Open‑Source Release Could Redefine AI Programming
NewBeeNLP
NewBeeNLP
Apr 11, 2024 · Artificial Intelligence

How Karpathy Built a 1,000‑Line C LLM Trainer Without Any Deep‑Learning Framework

Andrej Karpathy released LLM.C, a pure C/CUDA implementation that trains GPT‑2‑style models in about 1,000 lines of code, detailing manual forward/backward passes, memory allocation tricks, SIMD CPU acceleration, CUDA porting, and migration tutorials, while comparing it to PyTorch and discussing broader LLM OS implications.

C ProgrammingCUDAGPT
0 likes · 6 min read
How Karpathy Built a 1,000‑Line C LLM Trainer Without Any Deep‑Learning Framework
ByteDance Cloud Native
ByteDance Cloud Native
Mar 27, 2024 · Cloud Native

How ByteDance Optimized Its Metrics Agent for 70% CPU Savings

This article details how ByteDance's cloud‑native observability team tackled performance bottlenecks in their metricserver2 Agent—reducing memory copies, merging tiny packets, applying SIMD for tag parsing, and switching compression libraries—to cut CPU usage by over 10% and memory usage by nearly 20% while handling petabyte‑scale metric data.

MsgpackSIMDc++
0 likes · 15 min read
How ByteDance Optimized Its Metrics Agent for 70% CPU Savings
MaGe Linux Operations
MaGe Linux Operations
Jul 31, 2023 · Backend Development

Why ByteDance’s Sonic JSON Library Beats the Rest: JIT, SIMD, and Lazy‑Load Explained

The article introduces Sonic, ByteDance’s high‑performance Go JSON library built with Just‑In‑Time compilation and SIMD vectorization, explains its design motivations, usage patterns, API features, compatibility considerations, and showcases benchmark results that demonstrate its superiority over other popular JSON parsers.

JITJSONLibrary
0 likes · 33 min read
Why ByteDance’s Sonic JSON Library Beats the Rest: JIT, SIMD, and Lazy‑Load Explained
ByteDance SYS Tech
ByteDance SYS Tech
Dec 2, 2022 · Backend Development

How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON

Sonic‑CPP, an open‑source C++ JSON library co‑developed by ByteDance’s STE and Service Framework teams, leverages SIMD vectorization, optimized memory layout, on‑demand parsing, and a compact DOM design to achieve up to 2.5× faster parsing than RapidJSON and competitive serialization performance, with extensive benchmark results and production‑grade usage.

C++JSONSIMD
0 likes · 13 min read
How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON
DataFunSummit
DataFunSummit
Oct 27, 2022 · Databases

Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation

This article explains the motivation, design, and implementation of vectorizing Apache Doris's storage layer using SIMD techniques, covering engine overview, vectorized programming concepts, storage architecture, index and predicate optimizations, delayed materialization, output improvements, and performance test results.

Apache DorisOLAPSIMD
0 likes · 13 min read
Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation
StarRocks
StarRocks
Aug 17, 2022 · Databases

Why Vectorization Supercharges Database Performance: Deep Dive into StarRocks

This article explains how CPU‑centric vectorization, especially SIMD, reduces instruction count and CPI, addresses the four major CPU bottlenecks, and how StarRocks systematically applies automatic and manual SIMD techniques, verification methods, and a suite of engineering optimizations to achieve multi‑fold query speedups.

CPU optimizationSIMDStarRocks
0 likes · 16 min read
Why Vectorization Supercharges Database Performance: Deep Dive into StarRocks
DataFunTalk
DataFunTalk
Apr 18, 2022 · Databases

Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations

This article introduces graph databases, explains the subgraph‑matching problem, compares it with relational databases, discusses its computational complexity, and surveys backtracking and multi‑way join algorithms, worst‑case optimal joins, set‑intersection SIMD acceleration, and the gStore system’s research contributions.

RDFSIMDSPARQL
0 likes · 19 min read
Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations
DataFunSummit
DataFunSummit
Mar 21, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

This article explains how Apache Doris adopts CPU‑level vectorization and columnar storage to boost query performance, details the design and current status of its vectorized engine, and outlines future work such as JOIN acceleration, storage‑layer vectorization, import optimization, and extensive SQL function support.

Apache DorisColumnar StorageSIMD
0 likes · 21 min read
Vectorization in Apache Doris: Design, Implementation, and Future Roadmap
Kuaishou Tech
Kuaishou Tech
Mar 3, 2022 · Artificial Intelligence

Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK

This article details the engineering optimizations applied to the image cropping stage of Kuaishou's YKit AI SDK, covering instruction-level fixes, SIMD acceleration, I/O cache improvements, algorithmic refinements, parallel processing, and device‑tier strategies to achieve up to 4.6× speedup on mobile devices.

AI SDKImage ProcessingMobile AI
0 likes · 12 min read
Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK
DataFunTalk
DataFunTalk
Feb 27, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

This article explains how Apache Doris adopts CPU vectorization techniques—such as SIMD, columnar storage, and cache‑friendly designs—to boost query performance, detailing its current vectorized engine architecture, recent benchmarks, ongoing work on JOIN, storage, import, and future enhancements.

Apache DorisColumnar StorageDatabase Performance
0 likes · 22 min read
Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans
Programmer DD
Programmer DD
Sep 8, 2021 · Fundamentals

Parse 16‑Digit Timestamps Up to 700× Faster Than std::stringstream

This article explores why standard string‑to‑integer conversions become performance bottlenecks in high‑concurrency scenarios and presents a series of increasingly optimized C++ solutions—from native library calls to loop‑unrolled, byteswap, divide‑and‑conquer, and SIMD tricks—demonstrating dramatic speed gains backed by Google Benchmark results.

C++SIMDString Parsing
0 likes · 16 min read
Parse 16‑Digit Timestamps Up to 700× Faster Than std::stringstream
Baidu Intelligent Testing
Baidu Intelligent Testing
Jul 27, 2021 · Backend Development

Comprehensive Guide to Concurrency Optimization in Modern CPUs and Multithreaded Programming

This article systematically explores concurrency optimization for high‑performance C++ engineering, covering CPU trends, SIMD and out‑of‑order execution, single‑thread parallelism, lock‑free and wait‑free synchronization, and practical case studies of counters and queues to improve multithreaded scalability.

CPU architectureSIMDmultithreading
0 likes · 35 min read
Comprehensive Guide to Concurrency Optimization in Modern CPUs and Multithreaded Programming
Baidu Geek Talk
Baidu Geek Talk
Jun 16, 2021 · Fundamentals

Concurrent Optimization Techniques in C++: SIMD, Out‑of‑Order Execution, and Lock‑Free/Wait‑Free Algorithms

The article reviews Baidu C++ engineers' concurrency optimizations, explaining why modern software must exploit parallelism and detailing SIMD vectorization, out‑of‑order execution, and micro‑architectural analysis, then compares mutex, lock‑free, and wait‑free synchronization, showcasing case studies where atomic and wait‑free designs dramatically improve multithreaded performance.

SIMDconcurrencylock‑free
0 likes · 35 min read
Concurrent Optimization Techniques in C++: SIMD, Out‑of‑Order Execution, and Lock‑Free/Wait‑Free Algorithms
Tencent Tech
Tencent Tech
Mar 19, 2021 · Backend Development

How JDK 16’s Vector API Supercharges Java Performance for Data‑Intensive Workloads

JDK 16, released on March 16, introduces the Vector API—a set of Java interfaces that translate scalar operations into SIMD instructions, dramatically reducing operation counts and delivering 2‑5× speedups for matrix multiplication and up to 14‑16× for vector dot products, with notable contributions from Tencent’s Kona JDK team.

JDK16SIMDTencent Kona
0 likes · 6 min read
How JDK 16’s Vector API Supercharges Java Performance for Data‑Intensive Workloads
Baidu Geek Talk
Baidu Geek Talk
Feb 22, 2021 · Fundamentals

Can Embracing Hash Collisions Boost Performance? Inside B16 Hash Table

This article revisits traditional hash table design, then introduces a novel approach that deliberately leverages a controlled probability of hash collisions combined with SIMD parallelism, presenting the B16 and B16Compact hash tables, their structures, algorithms, and experimental results showing superior speed and space efficiency compared to unordered_map and F14.

B16Data StructuresSIMD
0 likes · 17 min read
Can Embracing Hash Collisions Boost Performance? Inside B16 Hash Table
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Sep 30, 2020 · Mobile Development

SIMD Acceleration Techniques on Qualcomm Hexagon DSP for Mobile Devices

The article explains how SIMD acceleration on Qualcomm’s Hexagon DSP, using its HVX vector engine and specialized instructions, can off‑load compute‑intensive tasks such as image, video, and AI processing from the CPU, delivering up to 8× speed‑up, lower power consumption, reduced thermal throttling, and longer battery life on mobile devices.

DSPHexagonMobile
0 likes · 9 min read
SIMD Acceleration Techniques on Qualcomm Hexagon DSP for Mobile Devices
Architects' Tech Alliance
Architects' Tech Alliance
Jul 5, 2020 · Fundamentals

Migrating Applications from x86 to Kunpeng (ARM): Overview, Methodology, and C/C++ Compilation Details

With the rise of mobile, IoT, and edge computing, the long‑standing dominance of x86 is challenged, prompting developers to migrate applications to ARM‑based Kunpeng platforms; this article explains the architectural differences, a five‑step migration methodology, and detailed C/C++ compilation considerations including instruction, macro, and SIMD adaptations.

ARMCompilationKunpeng
0 likes · 14 min read
Migrating Applications from x86 to Kunpeng (ARM): Overview, Methodology, and C/C++ Compilation Details
Beike Product & Technology
Beike Product & Technology
Jun 18, 2020 · Backend Development

Optimizing PHP strtolower with SSE2 in PHP 8

This article explains how PHP's case‑insensitive function handling can be accelerated by implementing a locale‑independent strtolower using SSE2 SIMD instructions in PHP 8, compares it with previous table‑lookup methods, and discusses a further Yaf‑specific optimization.

BackendPHPSIMD
0 likes · 6 min read
Optimizing PHP strtolower with SSE2 in PHP 8
Architects' Tech Alliance
Architects' Tech Alliance
Jan 27, 2019 · Backend Development

Understanding Network I/O Challenges and DPDK High‑Performance Solutions

The article analyzes the evolving demands of network I/O, the limitations of traditional kernel‑based networking, and presents DPDK’s user‑space bypass architecture, UIO mechanism, and a series of low‑level optimizations—including HugePages, poll‑mode drivers, SIMD, and cache‑aware coding—to achieve multi‑gigabit packet processing performance on modern Linux servers.

DPDKLinuxSIMD
0 likes · 14 min read
Understanding Network I/O Challenges and DPDK High‑Performance Solutions
dbaplus Community
dbaplus Community
Dec 16, 2015 · Databases

How DB2 BLU Accelerator Supercharges OLAP with Columnar Storage and SIMD

This article explains IBM DB2 BLU Accelerator’s columnar storage, multi‑level compression, TSN‑based logical rows, SIMD processing, intra‑parallel execution, probability‑based caching, and automatic admin features, showing how these technologies together deliver dramatic I/O and performance gains for analytical workloads.

BLU AcceleratorColumnar StorageDB2
0 likes · 15 min read
How DB2 BLU Accelerator Supercharges OLAP with Columnar Storage and SIMD
Baidu Tech Salon
Baidu Tech Salon
Jun 17, 2014 · Frontend Development

How SIMD Can Supercharge JavaScript Performance Across Browsers

JavaScript’s role in web performance is critical, and Intel’s new SIMD APIs—now being integrated into Chrome and Firefox—enable cross‑platform, plugin‑free acceleration that can boost script execution by 3‑10× on both x86 and ARM CPUs, as demonstrated on multiple hardware platforms.

Browser PerformanceChromeFirefox
0 likes · 4 min read
How SIMD Can Supercharge JavaScript Performance Across Browsers