Tag

SIMD

1 views collected around this technical thread.

Java Architecture Diary
Java Architecture Diary
Mar 21, 2025 · Backend Development

Boost Java Performance with the New Vector API: SIMD Made Simple

This article introduces Java’s emerging Vector API, explains its SIMD‑based design, provides practical code examples for array addition, dot product, and complex calculations, and details performance benchmarks, integration with vector databases, usage considerations, and future development prospects.

Backend DevelopmentJavaPerformance Optimization
0 likes · 10 min read
Boost Java Performance with the New Vector API: SIMD Made Simple
AntData
AntData
Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

FlinkPerformanceSIMD
0 likes · 17 min read
Flex: A Stream‑Batch Integrated Vectorized Engine for Flink
Java Tech Enthusiast
Java Tech Enthusiast
Jun 7, 2024 · Fundamentals

Engineer Builds GPU from Scratch in Two Weeks

In just two weeks, engineer Adam Majmudar designed and implemented a minimalist GPU called tiny‑gpu—complete with a custom 11‑instruction ISA, Verilog RTL, and verified via OpenLane—sharing the open‑source project on GitHub, earning thousands of stars, and preparing it for fabrication through Tiny Tapeout 7, showcasing how modern tools make DIY chip design increasingly accessible.

EDAGPUOpen Source Hardware
0 likes · 8 min read
Engineer Builds GPU from Scratch in Two Weeks
ByteDance Cloud Native
ByteDance Cloud Native
Mar 27, 2024 · Cloud Native

How ByteDance Optimized Its Metrics Agent for 70% CPU Savings

This article details how ByteDance's cloud‑native observability team tackled performance bottlenecks in their metricserver2 Agent—reducing memory copies, merging tiny packets, applying SIMD for tag parsing, and switching compression libraries—to cut CPU usage by over 10% and memory usage by nearly 20% while handling petabyte‑scale metric data.

C++ObservabilityPerformance Optimization
0 likes · 15 min read
How ByteDance Optimized Its Metrics Agent for 70% CPU Savings
ByteDance SYS Tech
ByteDance SYS Tech
Dec 2, 2022 · Backend Development

How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON

Sonic‑CPP, an open‑source C++ JSON library co‑developed by ByteDance’s STE and Service Framework teams, leverages SIMD vectorization, optimized memory layout, on‑demand parsing, and a compact DOM design to achieve up to 2.5× faster parsing than RapidJSON and competitive serialization performance, with extensive benchmark results and production‑grade usage.

C++JSONParsing
0 likes · 13 min read
How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON
DataFunSummit
DataFunSummit
Oct 27, 2022 · Databases

Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation

This article explains the motivation, design, and implementation of vectorizing Apache Doris's storage layer using SIMD techniques, covering engine overview, vectorized programming concepts, storage architecture, index and predicate optimizations, delayed materialization, output improvements, and performance test results.

Apache DorisOLAPPerformance Optimization
0 likes · 13 min read
Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation
Bilibili Tech
Bilibili Tech
Apr 22, 2022 · Frontend Development

Bilibili's WasmPlayer: WebAssembly-based HEVC Soft Decoding Solution for Web Video Playback

Bilibili’s WasmPlayer is a WebAssembly‑based HEVC software decoder that runs entirely in C/C++, handling demuxing, decoding, resampling and A/V sync on worker threads, using SIMD, AudioWorklet and OffscreenCanvas to deliver smooth 4K 60 fps web video playback with reduced bandwidth and CPU load.

AudioWorkletHEVCOffscreenCanvas
0 likes · 13 min read
Bilibili's WasmPlayer: WebAssembly-based HEVC Soft Decoding Solution for Web Video Playback
DataFunTalk
DataFunTalk
Apr 18, 2022 · Databases

Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations

This article introduces graph databases, explains the subgraph‑matching problem, compares it with relational databases, discusses its computational complexity, and surveys backtracking and multi‑way join algorithms, worst‑case optimal joins, set‑intersection SIMD acceleration, and the gStore system’s research contributions.

Query OptimizationRDFSIMD
0 likes · 19 min read
Subgraph Matching in Graph Databases: Concepts, Algorithms, and Optimizations
IT Services Circle
IT Services Circle
Apr 4, 2022 · Fundamentals

From Simple Loops to SIMD: The Evolution of Parallel Computation in CPU Design

The article narrates a CPU's journey from a naïve element‑wise increment loop to the adoption of SIMD, MMX, SSE, and AVX instruction sets, illustrating the motivations, challenges, and architectural decisions behind parallelizing integer and floating‑point operations.

CPU architectureInstruction setMMX
0 likes · 8 min read
From Simple Loops to SIMD: The Evolution of Parallel Computation in CPU Design
DataFunSummit
DataFunSummit
Mar 21, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

This article explains how Apache Doris adopts CPU‑level vectorization and columnar storage to boost query performance, details the design and current status of its vectorized engine, and outlines future work such as JOIN acceleration, storage‑layer vectorization, import optimization, and extensive SQL function support.

Apache DorisPerformance OptimizationSIMD
0 likes · 21 min read
Vectorization in Apache Doris: Design, Implementation, and Future Roadmap
IEG Growth Platform Technology Team
IEG Growth Platform Technology Team
Mar 15, 2022 · Backend Development

Optimizing Vector Retrieval in Go: SIMD and Plan9 Assembly for High‑Performance Vector Search

This article presents a backend‑focused study on reducing latency of vector‑based ad recommendation retrieval by leveraging Gonum, SIMD AVX2 intrinsics, and direct Plan9 assembly integration in Go, and it validates the approach with detailed performance benchmarks and CPU usage analysis.

GoOptimizationPerformance
0 likes · 17 min read
Optimizing Vector Retrieval in Go: SIMD and Plan9 Assembly for High‑Performance Vector Search
Kuaishou Tech
Kuaishou Tech
Mar 3, 2022 · Artificial Intelligence

Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK

This article details the engineering optimizations applied to the image cropping stage of Kuaishou's YKit AI SDK, covering instruction-level fixes, SIMD acceleration, I/O cache improvements, algorithmic refinements, parallel processing, and device‑tier strategies to achieve up to 4.6× speedup on mobile devices.

AI SDKNeonPerformance Optimization
0 likes · 12 min read
Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK
DataFunTalk
DataFunTalk
Feb 27, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

This article explains how Apache Doris adopts CPU vectorization techniques—such as SIMD, columnar storage, and cache‑friendly designs—to boost query performance, detailing its current vectorized engine architecture, recent benchmarks, ongoing work on JOIN, storage, import, and future enhancements.

Apache DorisDatabase PerformanceSIMD
0 likes · 22 min read
Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans
Laravel Tech Community
Laravel Tech Community
Aug 3, 2021 · Fundamentals

Rust 1.54.0 Release Highlights: Attribute Macros and wasm32 Intrinsics Stabilization

Rust 1.54.0 introduces support for invoking function‑like macros within attributes, enabling inclusion of external documentation via include_str! and nested macro usage, while also stabilizing many wasm32 intrinsics—including safe SIMD functions like v128_bitselect—expanding WebAssembly capabilities.

MacrosSIMDWebAssembly
0 likes · 3 min read
Rust 1.54.0 Release Highlights: Attribute Macros and wasm32 Intrinsics Stabilization
Baidu Intelligent Testing
Baidu Intelligent Testing
Jul 27, 2021 · Backend Development

Comprehensive Guide to Concurrency Optimization in Modern CPUs and Multithreaded Programming

This article systematically explores concurrency optimization for high‑performance C++ engineering, covering CPU trends, SIMD and out‑of‑order execution, single‑thread parallelism, lock‑free and wait‑free synchronization, and practical case studies of counters and queues to improve multithreaded scalability.

CPU architectureConcurrencyMultithreading
0 likes · 35 min read
Comprehensive Guide to Concurrency Optimization in Modern CPUs and Multithreaded Programming
Baidu Geek Talk
Baidu Geek Talk
Jun 16, 2021 · Fundamentals

Concurrent Optimization Techniques in C++: SIMD, Out‑of‑Order Execution, and Lock‑Free/Wait‑Free Algorithms

The article reviews Baidu C++ engineers' concurrency optimizations, explaining why modern software must exploit parallelism and detailing SIMD vectorization, out‑of‑order execution, and micro‑architectural analysis, then compares mutex, lock‑free, and wait‑free synchronization, showcasing case studies where atomic and wait‑free designs dramatically improve multithreaded performance.

C++ConcurrencyMultithreading
0 likes · 35 min read
Concurrent Optimization Techniques in C++: SIMD, Out‑of‑Order Execution, and Lock‑Free/Wait‑Free Algorithms
Tencent Tech
Tencent Tech
Mar 19, 2021 · Backend Development

How JDK 16’s Vector API Supercharges Java Performance for Data‑Intensive Workloads

JDK 16, released on March 16, introduces the Vector API—a set of Java interfaces that translate scalar operations into SIMD instructions, dramatically reducing operation counts and delivering 2‑5× speedups for matrix multiplication and up to 14‑16× for vector dot products, with notable contributions from Tencent’s Kona JDK team.

JDK16JavaPerformance Optimization
0 likes · 6 min read
How JDK 16’s Vector API Supercharges Java Performance for Data‑Intensive Workloads
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Sep 30, 2020 · Mobile Development

SIMD Acceleration Techniques on Qualcomm Hexagon DSP for Mobile Devices

The article explains how SIMD acceleration on Qualcomm’s Hexagon DSP, using its HVX vector engine and specialized instructions, can off‑load compute‑intensive tasks such as image, video, and AI processing from the CPU, delivering up to 8× speed‑up, lower power consumption, reduced thermal throttling, and longer battery life on mobile devices.

DSPHexagonMobile
0 likes · 9 min read
SIMD Acceleration Techniques on Qualcomm Hexagon DSP for Mobile Devices
Architects' Tech Alliance
Architects' Tech Alliance
Jul 5, 2020 · Fundamentals

Migrating Applications from x86 to Kunpeng (ARM): Overview, Methodology, and C/C++ Compilation Details

With the rise of mobile, IoT, and edge computing, the long‑standing dominance of x86 is challenged, prompting developers to migrate applications to ARM‑based Kunpeng platforms; this article explains the architectural differences, a five‑step migration methodology, and detailed C/C++ compilation considerations including instruction, macro, and SIMD adaptations.

ARMC++Compilation
0 likes · 14 min read
Migrating Applications from x86 to Kunpeng (ARM): Overview, Methodology, and C/C++ Compilation Details