Tagged articles

42 articles

Page 1 of 1

Mar 31, 2026 · Fundamentals

What’s New in C++26? A Deep Dive into Safety, Performance, and Usability

The C++26 standard, now frozen and slated for release in 2026, introduces static reflection, contracts, Safe C++ configurations, flat containers, SIMD and async models, plus an #embed directive, dramatically enhancing safety, performance, and developer ergonomics for modern system programming.

C++26ContractsSIMD

0 likes · 4 min read

What’s New in C++26? A Deep Dive into Safety, Performance, and Usability

Tech Musings

Jan 16, 2026 · Backend Development

Unlock Go’s New SIMD API: Boost Performance with GOEXPERIMENT=simd

This article explains the motivation behind adding SIMD support to Go, describes the two‑level design of the experimental simd/archsimd package, provides step‑by‑step configuration and code examples for common data‑processing tasks, and presents benchmark results that show up to nearly nine‑fold speedups without extra memory allocations.

GOEXPERIMENTGoSIMD

0 likes · 17 min read

Unlock Go’s New SIMD API: Boost Performance with GOEXPERIMENT=simd

Alibaba Cloud Developer

Jan 4, 2026 · Databases

Accelerating AliSQL Vector Search with Nodes Cache and SIMD

AliSQL 8.0 introduces a shared Nodes Cache and per‑transaction cache to speed up vector queries, implements RC‑level transaction isolation for read‑only and read‑write operations, and leverages SIMD‑based pre‑computation to dramatically improve high‑dimensional vector distance calculations and concurrency performance.

AliSQLSIMDcache optimization

0 likes · 9 min read

Accelerating AliSQL Vector Search with Nodes Cache and SIMD

Full-Stack Cultivation Path

Sep 21, 2025 · Frontend Development

WebAssembly 3.0: The Emerging Fourth Language for Front‑End Development

The article explains how WebAssembly, now in its 3.0 release, adds a fourth language to front‑end development by offering multi‑language support, near‑native performance, and new features such as 64‑bit memory, garbage collection, and tighter JavaScript integration for compute‑intensive web applications.

Front-endGarbage CollectionMulti-language

0 likes · 6 min read

WebAssembly 3.0: The Emerging Fourth Language for Front‑End Development

DataFunTalk

Sep 9, 2025 · Big Data

How Auron’s Vectorized Engine Doubles Big Data Performance Over Spark

The Auron project, a native vectorized execution engine donated by Kuaishou and now incubated by the Apache Software Foundation, leverages Rust and SIMD to cut resource overhead, achieve over‑two‑fold speedups on TPC‑DS benchmarks, and integrates seamlessly with Spark and other big‑data ecosystems.

Apache IncubatorAuronRust

0 likes · 6 min read

How Auron’s Vectorized Engine Doubles Big Data Performance Over Spark

Deepin Linux

Jul 24, 2025 · Fundamentals

Boosting Large-Scale std::vector Performance: Memory, Moves, and SIMD

This article examines why std::vector can become a bottleneck when handling millions of elements, analyzes memory consumption, insertion/deletion costs, and cache behavior, and presents practical optimizations such as pre‑allocation, move semantics, SIMD vectorization, and cache‑friendly designs illustrated with real‑world case studies and code examples.

Memory OptimizationSIMDc++

0 likes · 21 min read

Boosting Large-Scale std::vector Performance: Memory, Moves, and SIMD

Alibaba Cloud Developer

Jun 16, 2025 · Fundamentals

How to Make Your C Code Run Faster: 8 Proven Optimization Techniques

This article explains why code can run slowly on resource‑constrained devices and presents eight practical techniques—ranging from loop unrolling and memory access reduction to SIMD intrinsics and table look‑ups—to help C programmers write faster, more efficient code.

C ProgrammingCode OptimizationSIMD

0 likes · 18 min read

How to Make Your C Code Run Faster: 8 Proven Optimization Techniques

Java Architecture Diary

Mar 21, 2025 · Backend Development

Boost Java Performance with the New Vector API: SIMD Made Simple

This article introduces Java’s emerging Vector API, explains its SIMD‑based design, provides practical code examples for array addition, dot product, and complex calculations, and details performance benchmarks, integration with vector databases, usage considerations, and future development prospects.

SIMDbackend-developmentperformance optimization

0 likes · 10 min read

Boost Java Performance with the New Vector API: SIMD Made Simple

Xiao Lou's Tech Notes

Feb 17, 2025 · Backend Development

Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets

The article explains how Go 1.24’s new Swiss Tables hash‑map implementation replaces the traditional bucket‑based design with open addressing, SIMD‑accelerated probing, and metadata separation, detailing the underlying principles, performance advantages, handling of clustering and deletions, and a comparison with previous Go maps and Java’s HashMap.

GoSIMDhash map

0 likes · 16 min read

Swiss Tables in Go 1.24: Open Addressing, SIMD, and Metadata Secrets

21CTO

Feb 9, 2025 · Backend Development

How TikTok’s Sonic Library Supercharges Go JSON Performance

This article explains how TikTok engineers built Sonic, a high‑performance Go JSON library that leverages JIT compilation, SIMD instructions, smart memory handling, and optional features to dramatically reduce latency and memory usage compared with the standard encoding/json package, offering real‑world cost and speed benefits.

GoJITJSON

0 likes · 9 min read

How TikTok’s Sonic Library Supercharges Go JSON Performance

BirdNest Tech Talk

Feb 1, 2025 · Fundamentals

Can Go Harness SIMD for High‑Performance Computing? A Deep Dive

This article examines SIMD (Single Instruction Multiple Data) technology, its relevance to Go’s performance goals, the challenges of integrating SIMD into Go’s design, current standard‑library limitations, third‑party libraries, compiler support, and practical assembly examples, concluding with prospects for future Go SIMD adoption.

AssemblyGoSIMD

0 likes · 15 min read

Can Go Harness SIMD for High‑Performance Computing? A Deep Dive

AntData

Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

FlinkSIMDVelox

0 likes · 17 min read

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

Java Tech Enthusiast

Jun 7, 2024 · Fundamentals

Engineer Builds GPU from Scratch in Two Weeks

In just two weeks, engineer Adam Majmudar designed and implemented a minimalist GPU called tiny‑gpu—complete with a custom 11‑instruction ISA, Verilog RTL, and verified via OpenLane—sharing the open‑source project on GitHub, earning thousands of stars, and preparing it for fabrication through Tiny Tapeout 7, showcasing how modern tools make DIY chip design increasingly accessible.

Chip DesignEDAGPU

0 likes · 8 min read

Engineer Builds GPU from Scratch in Two Weeks

21CTO

Apr 15, 2024 · Artificial Intelligence

Why Mojo’s Open‑Source Release Could Redefine AI Programming

Modular Inc. announced the open‑source release of Mojo’s core standard library, highlighting its Python‑like syntax, MLIR‑based compiler, SIMD‑first design, eager destruction, and performance claims of being tens of thousands of times faster than Python, positioning it as a potential future‑dominant AI language.

AI programmingMLIRMojo

0 likes · 14 min read

Why Mojo’s Open‑Source Release Could Redefine AI Programming

NewBeeNLP

Apr 11, 2024 · Artificial Intelligence

How Karpathy Built a 1,000‑Line C LLM Trainer Without Any Deep‑Learning Framework

Andrej Karpathy released LLM.C, a pure C/CUDA implementation that trains GPT‑2‑style models in about 1,000 lines of code, detailing manual forward/backward passes, memory allocation tricks, SIMD CPU acceleration, CUDA porting, and migration tutorials, while comparing it to PyTorch and discussing broader LLM OS implications.

C ProgrammingCUDAGPT

0 likes · 6 min read

How Karpathy Built a 1,000‑Line C LLM Trainer Without Any Deep‑Learning Framework

ByteDance Cloud Native

Mar 27, 2024 · Cloud Native

How ByteDance Optimized Its Metrics Agent for 70% CPU Savings

This article details how ByteDance's cloud‑native observability team tackled performance bottlenecks in their metricserver2 Agent—reducing memory copies, merging tiny packets, applying SIMD for tag parsing, and switching compression libraries—to cut CPU usage by over 10% and memory usage by nearly 20% while handling petabyte‑scale metric data.

MsgpackSIMDc++

0 likes · 15 min read

How ByteDance Optimized Its Metrics Agent for 70% CPU Savings

MaGe Linux Operations

Jul 31, 2023 · Backend Development

Why ByteDance’s Sonic JSON Library Beats the Rest: JIT, SIMD, and Lazy‑Load Explained

The article introduces Sonic, ByteDance’s high‑performance Go JSON library built with Just‑In‑Time compilation and SIMD vectorization, explains its design motivations, usage patterns, API features, compatibility considerations, and showcases benchmark results that demonstrate its superiority over other popular JSON parsers.

JITJSONLibrary

0 likes · 33 min read

Why ByteDance’s Sonic JSON Library Beats the Rest: JIT, SIMD, and Lazy‑Load Explained

dbaplus Community

May 30, 2023 · Databases

What PostgreSQL 16’s New Parallel Query and SIMD Features Really Mean for Performance

PostgreSQL 16 introduces enhanced parallel query capabilities, incremental sorting, SIMD acceleration, improved logical replication, and new monitoring tools like pg_stat_io, offering notable performance gains while still leaving some long‑awaited features such as XID‑64 absent.

Logical ReplicationParallel QuerySIMD

0 likes · 9 min read

What PostgreSQL 16’s New Parallel Query and SIMD Features Really Mean for Performance

ByteDance SYS Tech

Dec 2, 2022 · Backend Development

How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON

Sonic‑CPP, an open‑source C++ JSON library co‑developed by ByteDance’s STE and Service Framework teams, leverages SIMD vectorization, optimized memory layout, on‑demand parsing, and a compact DOM design to achieve up to 2.5× faster parsing than RapidJSON and competitive serialization performance, with extensive benchmark results and production‑grade usage.

C++JSONSIMD

0 likes · 13 min read

How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON

DataFunSummit

Oct 27, 2022 · Databases

Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation

This article explains the motivation, design, and implementation of vectorizing Apache Doris's storage layer using SIMD techniques, covering engine overview, vectorized programming concepts, storage architecture, index and predicate optimizations, delayed materialization, output improvements, and performance test results.

Apache DorisOLAPSIMD

0 likes · 13 min read

Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation

StarRocks

Aug 17, 2022 · Databases

Why Vectorization Supercharges Database Performance: Deep Dive into StarRocks

This article explains how CPU‑centric vectorization, especially SIMD, reduces instruction count and CPI, addresses the four major CPU bottlenecks, and how StarRocks systematically applies automatic and manual SIMD techniques, verification methods, and a suite of engineering optimizations to achieve multi‑fold query speedups.

CPU optimizationSIMDStarRocks

0 likes · 16 min read

Why Vectorization Supercharges Database Performance: Deep Dive into StarRocks

Bilibili Tech

Apr 22, 2022 · Frontend Development

Bilibili's WasmPlayer: WebAssembly-based HEVC Soft Decoding Solution for Web Video Playback

Bilibili’s WasmPlayer is a WebAssembly‑based HEVC software decoder that runs entirely in C/C++, handling demuxing, decoding, resampling and A/V sync on worker threads, using SIMD, AudioWorklet and OffscreenCanvas to deliver smooth 4K 60 fps web video playback with reduced bandwidth and CPU load.

AudioWorkletHEVCMultimedia

0 likes · 13 min read

Bilibili's WasmPlayer: WebAssembly-based HEVC Soft Decoding Solution for Web Video Playback

DataFunTalk

Apr 18, 2022 · Databases

Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations

This article introduces graph databases, explains the subgraph‑matching problem, compares it with relational databases, discusses its computational complexity, and surveys backtracking and multi‑way join algorithms, worst‑case optimal joins, set‑intersection SIMD acceleration, and the gStore system’s research contributions.

RDFSIMDSPARQL

0 likes · 19 min read

Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations

IT Services Circle

Apr 4, 2022 · Fundamentals

From Simple Loops to SIMD: The Evolution of Parallel Computation in CPU Design

The article narrates a CPU's journey from a naïve element‑wise increment loop to the adoption of SIMD, MMX, SSE, and AVX instruction sets, illustrating the motivations, challenges, and architectural decisions behind parallelizing integer and floating‑point operations.

CPU architectureInstruction SetMMX

0 likes · 8 min read

From Simple Loops to SIMD: The Evolution of Parallel Computation in CPU Design

DataFunSummit

Mar 21, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

This article explains how Apache Doris adopts CPU‑level vectorization and columnar storage to boost query performance, details the design and current status of its vectorized engine, and outlines future work such as JOIN acceleration, storage‑layer vectorization, import optimization, and extensive SQL function support.

Apache DorisColumnar StorageSIMD

0 likes · 21 min read

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

IEG Growth Platform Technology Team

Mar 15, 2022 · Backend Development

Optimizing Vector Retrieval in Go: SIMD and Plan9 Assembly for High‑Performance Vector Search

This article presents a backend‑focused study on reducing latency of vector‑based ad recommendation retrieval by leveraging Gonum, SIMD AVX2 intrinsics, and direct Plan9 assembly integration in Go, and it validates the approach with detailed performance benchmarks and CPU usage analysis.

AssemblyBackendSIMD

0 likes · 17 min read

Optimizing Vector Retrieval in Go: SIMD and Plan9 Assembly for High‑Performance Vector Search

Kuaishou Tech

Mar 3, 2022 · Artificial Intelligence

Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK

This article details the engineering optimizations applied to the image cropping stage of Kuaishou's YKit AI SDK, covering instruction-level fixes, SIMD acceleration, I/O cache improvements, algorithmic refinements, parallel processing, and device‑tier strategies to achieve up to 4.6× speedup on mobile devices.

AI SDKImage ProcessingMobile AI

0 likes · 12 min read

Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK

DataFunTalk

Feb 27, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

This article explains how Apache Doris adopts CPU vectorization techniques—such as SIMD, columnar storage, and cache‑friendly designs—to boost query performance, detailing its current vectorized engine architecture, recent benchmarks, ongoing work on JOIN, storage, import, and future enhancements.

Apache DorisColumnar StorageDatabase Performance

0 likes · 22 min read

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

Programmer DD

Sep 8, 2021 · Fundamentals

Parse 16‑Digit Timestamps Up to 700× Faster Than std::stringstream

This article explores why standard string‑to‑integer conversions become performance bottlenecks in high‑concurrency scenarios and presents a series of increasingly optimized C++ solutions—from native library calls to loop‑unrolled, byteswap, divide‑and‑conquer, and SIMD tricks—demonstrating dramatic speed gains backed by Google Benchmark results.

C++SIMDString Parsing

0 likes · 16 min read

Parse 16‑Digit Timestamps Up to 700× Faster Than std::stringstream

Baidu Intelligent Testing

Aug 24, 2021 · Fundamentals

B16 Hash Table: Exploiting Controlled Collisions and SIMD for High‑Performance, Compact Hashing

This article introduces the B16 hash table, a novel design that deliberately tolerates a higher hash‑collision probability and leverages SIMD instructions to achieve faster lookups, lower memory overhead, and a compact read‑only variant comparable to minimal perfect hash functions.

B16Memory OptimizationSIMD

0 likes · 15 min read

B16 Hash Table: Exploiting Controlled Collisions and SIMD for High‑Performance, Compact Hashing

Laravel Tech Community

Aug 3, 2021 · Fundamentals

Rust 1.54.0 Release Highlights: Attribute Macros and wasm32 Intrinsics Stabilization

Rust 1.54.0 introduces support for invoking function‑like macros within attributes, enabling inclusion of external documentation via include_str! and nested macro usage, while also stabilizing many wasm32 intrinsics—including safe SIMD functions like v128_bitselect—expanding WebAssembly capabilities.

MacrosRustSIMD

0 likes · 3 min read

Rust 1.54.0 Release Highlights: Attribute Macros and wasm32 Intrinsics Stabilization

Baidu Intelligent Testing

Jul 27, 2021 · Backend Development

Comprehensive Guide to Concurrency Optimization in Modern CPUs and Multithreaded Programming

This article systematically explores concurrency optimization for high‑performance C++ engineering, covering CPU trends, SIMD and out‑of‑order execution, single‑thread parallelism, lock‑free and wait‑free synchronization, and practical case studies of counters and queues to improve multithreaded scalability.

CPU architectureSIMDmultithreading

0 likes · 35 min read

Comprehensive Guide to Concurrency Optimization in Modern CPUs and Multithreaded Programming

Baidu Geek Talk

Jun 16, 2021 · Fundamentals

Concurrent Optimization Techniques in C++: SIMD, Out‑of‑Order Execution, and Lock‑Free/Wait‑Free Algorithms

The article reviews Baidu C++ engineers' concurrency optimizations, explaining why modern software must exploit parallelism and detailing SIMD vectorization, out‑of‑order execution, and micro‑architectural analysis, then compares mutex, lock‑free, and wait‑free synchronization, showcasing case studies where atomic and wait‑free designs dramatically improve multithreaded performance.

SIMDconcurrencylock‑free

0 likes · 35 min read

Concurrent Optimization Techniques in C++: SIMD, Out‑of‑Order Execution, and Lock‑Free/Wait‑Free Algorithms

Tencent Tech

Mar 19, 2021 · Backend Development

How JDK 16’s Vector API Supercharges Java Performance for Data‑Intensive Workloads

JDK 16, released on March 16, introduces the Vector API—a set of Java interfaces that translate scalar operations into SIMD instructions, dramatically reducing operation counts and delivering 2‑5× speedups for matrix multiplication and up to 14‑16× for vector dot products, with notable contributions from Tencent’s Kona JDK team.

JDK16SIMDTencent Kona

0 likes · 6 min read

How JDK 16’s Vector API Supercharges Java Performance for Data‑Intensive Workloads

Baidu Geek Talk

Feb 22, 2021 · Fundamentals

Can Embracing Hash Collisions Boost Performance? Inside B16 Hash Table

This article revisits traditional hash table design, then introduces a novel approach that deliberately leverages a controlled probability of hash collisions combined with SIMD parallelism, presenting the B16 and B16Compact hash tables, their structures, algorithms, and experimental results showing superior speed and space efficiency compared to unordered_map and F14.

B16Data StructuresSIMD

0 likes · 17 min read

Can Embracing Hash Collisions Boost Performance? Inside B16 Hash Table

OPPO Kernel Craftsman

Sep 30, 2020 · Mobile Development

SIMD Acceleration Techniques on Qualcomm Hexagon DSP for Mobile Devices

The article explains how SIMD acceleration on Qualcomm’s Hexagon DSP, using its HVX vector engine and specialized instructions, can off‑load compute‑intensive tasks such as image, video, and AI processing from the CPU, delivering up to 8× speed‑up, lower power consumption, reduced thermal throttling, and longer battery life on mobile devices.

DSPHexagonMobile

0 likes · 9 min read

SIMD Acceleration Techniques on Qualcomm Hexagon DSP for Mobile Devices

Architects' Tech Alliance

Jul 5, 2020 · Fundamentals

Migrating Applications from x86 to Kunpeng (ARM): Overview, Methodology, and C/C++ Compilation Details

With the rise of mobile, IoT, and edge computing, the long‑standing dominance of x86 is challenged, prompting developers to migrate applications to ARM‑based Kunpeng platforms; this article explains the architectural differences, a five‑step migration methodology, and detailed C/C++ compilation considerations including instruction, macro, and SIMD adaptations.

ARMCompilationKunpeng

0 likes · 14 min read

Migrating Applications from x86 to Kunpeng (ARM): Overview, Methodology, and C/C++ Compilation Details

Beike Product & Technology

Jun 18, 2020 · Backend Development

Optimizing PHP strtolower with SSE2 in PHP 8

This article explains how PHP's case‑insensitive function handling can be accelerated by implementing a locale‑independent strtolower using SSE2 SIMD instructions in PHP 8, compares it with previous table‑lookup methods, and discusses a further Yaf‑specific optimization.

BackendPHPSIMD

0 likes · 6 min read

Optimizing PHP strtolower with SSE2 in PHP 8

Beike Product & Technology

Mar 10, 2020 · Fundamentals

Optimizing String Replacement Using SSE2 SIMD Instructions

This article explains how to use SSE2 SIMD instructions to optimize string replacement operations, demonstrating a 16-character batch processing technique that significantly improves performance for longer strings.

AssemblyBatch ProcessingSIMD

0 likes · 4 min read

Optimizing String Replacement Using SSE2 SIMD Instructions

Architects' Tech Alliance

Jan 27, 2019 · Backend Development

Understanding Network I/O Challenges and DPDK High‑Performance Solutions

The article analyzes the evolving demands of network I/O, the limitations of traditional kernel‑based networking, and presents DPDK’s user‑space bypass architecture, UIO mechanism, and a series of low‑level optimizations—including HugePages, poll‑mode drivers, SIMD, and cache‑aware coding—to achieve multi‑gigabit packet processing performance on modern Linux servers.

DPDKLinuxSIMD

0 likes · 14 min read

Understanding Network I/O Challenges and DPDK High‑Performance Solutions

dbaplus Community

Dec 16, 2015 · Databases

How DB2 BLU Accelerator Supercharges OLAP with Columnar Storage and SIMD

This article explains IBM DB2 BLU Accelerator’s columnar storage, multi‑level compression, TSN‑based logical rows, SIMD processing, intra‑parallel execution, probability‑based caching, and automatic admin features, showing how these technologies together deliver dramatic I/O and performance gains for analytical workloads.

BLU AcceleratorColumnar StorageDB2

0 likes · 15 min read

How DB2 BLU Accelerator Supercharges OLAP with Columnar Storage and SIMD

Baidu Tech Salon

Jun 17, 2014 · Frontend Development

How SIMD Can Supercharge JavaScript Performance Across Browsers

JavaScript’s role in web performance is critical, and Intel’s new SIMD APIs—now being integrated into Chrome and Firefox—enable cross‑platform, plugin‑free acceleration that can boost script execution by 3‑10× on both x86 and ARM CPUs, as demonstrated on multiple hardware platforms.

Browser PerformanceChromeFirefox

0 likes · 4 min read

How SIMD Can Supercharge JavaScript Performance Across Browsers