Tagged articles
110 articles
Page 1 of 2
FunTester
FunTester
May 20, 2026 · Artificial Intelligence

How Anthropic’s Multi‑Agent Orchestration Enables Parallel Workflows

The article explains why a single AI agent hits context and execution limits, describes Anthropic’s multi‑agent orchestration that splits tasks among dedicated sub‑agents coordinated by a controller, discusses model selection, communication, observability, and outlines scenarios where parallel orchestration delivers real benefits.

AI AgentsModel SelectionMultiagent
0 likes · 11 min read
How Anthropic’s Multi‑Agent Orchestration Enables Parallel Workflows
Java Architect Handbook
Java Architect Handbook
Apr 20, 2026 · Backend Development

Concurrency vs Parallelism in Java: Definitions, CPU Mechanics, and Interview Tips

The article explains how concurrency differs from parallelism by defining logical versus physical simultaneity, illustrates the concepts with everyday analogies and CPU scheduling details, provides Java code examples, lists common interview follow‑up questions, and offers a concise mnemonic for remembering the distinction.

Backend DevelopmentCPUJava
0 likes · 10 min read
Concurrency vs Parallelism in Java: Definitions, CPU Mechanics, and Interview Tips
Code Mala Tang
Code Mala Tang
Feb 24, 2026 · Backend Development

Why Async FastAPI Still Blocks and How to Offload Heavy Work

After fixing unlimited queries and pagination issues, this article reveals why async FastAPI still stalls under load, outlines the hidden bottlenecks in the request lifecycle, and provides practical rules and code examples for offloading heavy work to background workers, ensuring scalability, idempotence, and observability.

AsyncFastAPIParallelism
0 likes · 9 min read
Why Async FastAPI Still Blocks and How to Offload Heavy Work
Tech Verticals & Horizontals
Tech Verticals & Horizontals
Jan 14, 2026 · Artificial Intelligence

Why Parallelism Matters: Designing Multi‑Agent Architectures for Scalable AI Systems

The article explains why parallelism is crucial for large‑scale AI systems—addressing I/O latency and reliability—by detailing core agent patterns, multi‑agent architectures, reliability strategies, and advanced retrieval‑augmented generation techniques, each illustrated with concrete Jupyter notebooks.

AI GovernanceParallelismRAG
0 likes · 6 min read
Why Parallelism Matters: Designing Multi‑Agent Architectures for Scalable AI Systems
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 5, 2026 · Artificial Intelligence

How Baidu Tianchi Supernodes Supercharge Large‑Model Inference: Architecture, Deployment, and Optimization

This article details Baidu's Tianchi supernode design and software tuning—covering hardware scale‑up, deployment planning, Prefill and Decode stage optimizations, quantization strategies, and communication schemes—to dramatically boost large‑model inference throughput and latency while lowering token‑cost.

AI InfrastructureParallelismPerformance Optimization
0 likes · 20 min read
How Baidu Tianchi Supernodes Supercharge Large‑Model Inference: Architecture, Deployment, and Optimization
AI Insight Log
AI Insight Log
Jan 3, 2026 · Artificial Intelligence

13 Proven Tricks to Double Your AI‑Assisted Coding Efficiency (From Claude Code’s Founder)

Boris Cherny, the founder of Claude Code, reveals a detailed 13‑step workflow that combines aggressive parallelism, Opus 4.5 with Thinking mode, a shared CLAUDE.md knowledge base, custom slash commands, sub‑agents, automated formatting hooks, permission presets, deep tool integrations, and a strict verification loop to dramatically boost AI‑driven development productivity.

AI-assisted developmentClaude CodeMCP integration
0 likes · 9 min read
13 Proven Tricks to Double Your AI‑Assisted Coding Efficiency (From Claude Code’s Founder)
JavaScript
JavaScript
Oct 14, 2025 · Frontend Development

Boost JavaScript Async Performance by Up to 80% with New Promise Techniques

While async/await simplifies JavaScript code, it can introduce significant overhead in high‑frequency or compute‑heavy scenarios; this article introduces alternative async patterns—optimized Promise chaining, parallel Promise.all, batch processing, and pooling—that can reduce context switches and deliver performance gains of up to 80%.

JavaScriptParallelismPerformance Optimization
0 likes · 5 min read
Boost JavaScript Async Performance by Up to 80% with New Promise Techniques
JavaScript
JavaScript
Sep 16, 2025 · Frontend Development

Boost JavaScript Async Performance: Up to 80% Faster Than async/await

This article explains why async/await can cause performance bottlenecks in JavaScript and introduces optimized Promise‑based techniques—such as chain optimization, Promise.all parallelism, batch processing, and pooling—that can improve async execution speed by up to 80% in specific scenarios.

JavaScriptParallelismPerformance Optimization
0 likes · 4 min read
Boost JavaScript Async Performance: Up to 80% Faster Than async/await
php Courses
php Courses
Sep 10, 2025 · Fundamentals

Mastering C++11 Concurrency: std::thread, std::async, and Best Practices

This guide explains why modern C++ programs need concurrency, introduces the core C++11 tools std::thread and std::async, demonstrates basic usage, parameterized threads, lambda expressions, async task handling, synchronization with mutexes, exception safety, parallel data processing, and provides best‑practice tips for efficient and safe multithreaded development.

CParallelismstd::async
0 likes · 10 min read
Mastering C++11 Concurrency: std::thread, std::async, and Best Practices
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 6, 2025 · Artificial Intelligence

How VeOmni Revolutionizes Multimodal Model Training with 40% Speed Gains

VeOmni, ByteDance’s open‑source unified multimodal training framework, tackles fragmented training pipelines by integrating LoRA fine‑tuning, FSDP, Ulysses, and Expert Parallel, delivering up to 40% higher throughput, up to 55% memory savings, and streamlined one‑click deployment for LLM, VLM, and video models.

AIFrameworkParallelism
0 likes · 14 min read
How VeOmni Revolutionizes Multimodal Model Training with 40% Speed Gains
Code Mala Tang
Code Mala Tang
Jul 22, 2025 · Fundamentals

Boost Python Loops: Parallelism, Generators, and Profiling Made Easy

This guide shows how to accelerate slow Python for‑loops by leveraging multi‑core parallelism, memory‑efficient generators, and a suite of profiling tools, providing step‑by‑step code examples and practical tips to identify and fix performance bottlenecks.

GeneratorsParallelismProfiling
0 likes · 16 min read
Boost Python Loops: Parallelism, Generators, and Profiling Made Easy
FunTester
FunTester
Jul 13, 2025 · Backend Development

Master Go Concurrency: Goroutines, Channels, and Real-World Examples

Learn how Go’s built‑in concurrency model using goroutines and channels can transform sequential code into responsive, high‑performance applications, with clear explanations of concurrency vs parallelism, practical code samples, synchronization techniques, and best practices for building scalable web servers.

ChannelGoroutineParallelism
0 likes · 10 min read
Master Go Concurrency: Goroutines, Channels, and Real-World Examples
Architect
Architect
May 18, 2025 · Artificial Intelligence

How Much GPU Memory Can One Model Use? A Deep Dive into Transformer Memory Accounting

This article breaks down GPU memory consumption for large Transformer models, explains how to estimate each component—parameters, optimizer state, activations, gradients—and shows how parallelism, mixed precision, and recomputation strategies can dramatically reduce the footprint.

AI trainingGPU MemoryMemory Optimization
0 likes · 14 min read
How Much GPU Memory Can One Model Use? A Deep Dive into Transformer Memory Accounting
FunTester
FunTester
Apr 18, 2025 · Backend Development

Using CompletableFuture for Parallel REST Calls in Java

The article explains why serial REST calls cause performance bottlenecks, illustrates the benefits of concurrent requests, and demonstrates how Java 8's CompletableFuture can be used to implement parallel REST calls with robust exception handling, improving throughput and resource utilization.

CompletableFutureJavaParallelism
0 likes · 10 min read
Using CompletableFuture for Parallel REST Calls in Java
Cognitive Technology Team
Cognitive Technology Team
Apr 12, 2025 · Backend Development

Using CompletableFuture with Streams for Parallel Execution in Java

The article explains how to correctly combine Java's CompletableFuture with Stream API to achieve true asynchronous parallelism, highlights common pitfalls that lead to sequential execution, and provides the proper pattern of creating a CompletableFuture stream followed by a terminal operation.

CompletableFutureJavaParallelism
0 likes · 3 min read
Using CompletableFuture with Streams for Parallel Execution in Java
Architecture Development Notes
Architecture Development Notes
Mar 16, 2025 · Backend Development

Choosing the Right Concurrency Model: Go vs Python vs Rust

This article compares Go, Python, and Rust concurrency implementations—covering CSP‑based goroutines, GIL constraints, and ownership‑driven thread safety—to help developers select the most suitable model for high‑throughput, CPU‑bound, or safety‑critical applications.

AsyncGoParallelism
0 likes · 9 min read
Choosing the Right Concurrency Model: Go vs Python vs Rust
DataFunSummit
DataFunSummit
Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu's machine‑learning platform lead Wang Xin's presentation on the ZhiLight large‑model inference framework, covering model execution mechanisms, GPU workload analysis, pipeline and tensor parallelism, GPU architecture evolution, open‑source engine comparisons, ZhiLight's compute‑communication overlap and quantization optimizations, benchmark results, supported models, and future directions.

GPUInferenceLLM
0 likes · 13 min read
Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations
Code Mala Tang
Code Mala Tang
Feb 15, 2025 · Fundamentals

Unlock Full CPU Power in Python: A Hands‑On Guide to Multiprocessing

This article explains why Python’s Global Interpreter Lock limits CPU core usage, introduces the multiprocessing module for parallel execution of CPU‑intensive tasks, and provides step‑by‑step code examples, key concepts, synchronization tools, a real‑world image‑processing case, and best practices to dramatically speed up your programs.

CPU BoundParallelismPython
0 likes · 9 min read
Unlock Full CPU Power in Python: A Hands‑On Guide to Multiprocessing
DataFunSummit
DataFunSummit
Dec 30, 2024 · Artificial Intelligence

Colossal-AI: A Scalable Framework for Distributed Training of Large Models

This presentation introduces the challenges of the large‑model era, describes the Colossal‑AI architecture—including N‑dimensional parallelism, heterogeneous storage, and zero‑code experience—shows benchmark results and real‑world use cases, and answers audience questions about its integration with PyTorch and advanced parallel strategies.

AI InfrastructureBenchmarkColossal-AI
0 likes · 11 min read
Colossal-AI: A Scalable Framework for Distributed Training of Large Models
FunTester
FunTester
Nov 25, 2024 · Fundamentals

Understanding Concurrency and Parallelism in Java Multithreading

This article introduces the basics of Java multithreading concurrency, explains the difference between concurrency and parallelism with a supermarket analogy, and details thread pool creation, usage, and customization through analysis of ThreadPoolExecutor source code.

JavaParallelismThreadPoolExecutor
0 likes · 9 min read
Understanding Concurrency and Parallelism in Java Multithreading
Kuaishou Large Model
Kuaishou Large Model
Nov 22, 2024 · Artificial Intelligence

Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism

This article details a comprehensive set of techniques—including data‑ and tensor‑parallel overlap, context‑parallelism, activation rematerialization, and a performance‑driven cost model—that dramatically improve large‑language‑model training efficiency on ultra‑large GPU clusters while preserving model quality.

Distributed TrainingParallelismPerformance Modeling
0 likes · 28 min read
Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism
Top Architect
Top Architect
Oct 17, 2024 · Backend Development

Understanding ForkJoinPool and the Fork/Join Framework in Java

This article explains the limitations of ThreadPoolExecutor, introduces the Fork/Join model and ForkJoinPool, demonstrates how to implement divide‑and‑conquer tasks with RecursiveTask, analyzes the pool’s design, task submission methods, work‑stealing mechanism, common pool pitfalls, and presents performance evaluation results.

DivideAndConquerForkJoinPoolJava
0 likes · 26 min read
Understanding ForkJoinPool and the Fork/Join Framework in Java
MaGe Linux Operations
MaGe Linux Operations
Sep 28, 2024 · Backend Development

Master Go Concurrency: Goroutines, Channels, Locks, Timers and Synchronization

This comprehensive guide explains the fundamentals of concurrent programming in Go, covering the differences between parallelism and concurrency, process and thread concepts, and detailed usage of goroutines, channels, select statements, timers, mutexes, read‑write locks, wait groups, once, sync.Map, and atomic operations with practical code examples and diagrams.

ChannelGoroutineParallelism
0 likes · 42 min read
Master Go Concurrency: Goroutines, Channels, Locks, Timers and Synchronization
Test Development Learning Exchange
Test Development Learning Exchange
Sep 22, 2024 · Fundamentals

Understanding Concurrency, Parallelism, Synchronization, Asynchronous, Blocking, and Non‑blocking in Python with Code Examples

This article explains the key concepts of concurrency, parallelism, synchronization, asynchronous execution, blocking, and non‑blocking in Python, providing clear explanations and practical code samples for each concept, including API automation examples for HTTP requests.

BlockingNon-blockingParallelism
0 likes · 14 min read
Understanding Concurrency, Parallelism, Synchronization, Asynchronous, Blocking, and Non‑blocking in Python with Code Examples
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Sep 18, 2024 · Artificial Intelligence

How Distributed Training Powers Massive Language Models: Concepts, Strategies, and Code

This article explains why single‑machine resources are insufficient for training ever‑larger language models, introduces the fundamentals of distributed training systems, details various parallel strategies such as data, model, pipeline, and hybrid parallelism, and provides practical PyTorch code and memory‑optimization techniques to accelerate large‑scale model training.

Deep LearningGPUParallelism
0 likes · 29 min read
How Distributed Training Powers Massive Language Models: Concepts, Strategies, and Code
Architecture and Beyond
Architecture and Beyond
Sep 7, 2024 · Backend Development

Six Proven Backend Techniques to Supercharge System Performance

This comprehensive guide walks backend architects through six core optimization methods—caching, batch processing, asynchronous handling, data compression, parallelization, and eliminating unnecessary requests—detailing their problem domains, implementation strategies, real‑world scenarios, benefits, and trade‑offs.

AsynchronousBackendBatch Processing
0 likes · 48 min read
Six Proven Backend Techniques to Supercharge System Performance
Python Programming Learning Circle
Python Programming Learning Circle
Sep 3, 2024 · Fundamentals

Simplifying Python Parallelism with map and ThreadPool

This article explains why traditional Python multithreading tutorials are often overly complex, introduces the concise map‑based approach using multiprocessing and multiprocessing.dummy ThreadPool, demonstrates performance gains with real‑world examples, and provides ready‑to‑run code snippets for efficient parallel execution.

MAPParallelismmultiprocessing
0 likes · 10 min read
Simplifying Python Parallelism with map and ThreadPool
FunTester
FunTester
Aug 15, 2024 · Backend Development

9 Proven Techniques to Supercharge Service Performance

This article outlines nine practical methods—caching, parallelization, batch processing, data compression, lock‑free design, sharding, request avoidance, pooling, and asynchronous handling—demonstrating how each can be applied to backend services to dramatically reduce latency and improve throughput.

AsynchronousBatch ProcessingParallelism
0 likes · 25 min read
9 Proven Techniques to Supercharge Service Performance
360 Smart Cloud
360 Smart Cloud
Jul 17, 2024 · Artificial Intelligence

Parallelism and Memory‑Optimization Techniques for Distributed Large‑Scale Transformer Training

This article reviews the principles and practical implementations of data, pipeline, tensor, sequence, and context parallelism together with memory‑saving strategies such as recomputation and ZeRO, and demonstrates how the QLM framework leverages these techniques to accelerate large‑model training and fine‑tuning on multi‑GPU clusters.

GPUMegatron-LMMemory Optimization
0 likes · 18 min read
Parallelism and Memory‑Optimization Techniques for Distributed Large‑Scale Transformer Training
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 11, 2024 · Artificial Intelligence

Why Separate Prefill and Decode? A Deep Dive into DistServe’s Split Inference Architecture

This article explores the two‑stage LLM inference pipeline, introduces TTFT and TPOT metrics, explains the motivation for prefilling‑decoding separation, presents experimental comparisons between split and merged architectures, and details optimization techniques and parallel‑strategy modeling for DistServe.

DistServeGoodputLLM inference
0 likes · 28 min read
Why Separate Prefill and Decode? A Deep Dive into DistServe’s Split Inference Architecture
Architect
Architect
Jun 26, 2024 · Backend Development

Understanding the Fork/Join Framework and ForkJoinPool in Java

This article explains the limitations of ThreadPoolExecutor, introduces the Fork/Join model and ForkJoinPool, demonstrates how to implement divide‑and‑conquer tasks with RecursiveTask, provides performance benchmarks, and discusses design details, task submission methods, work‑stealing, and cautions about using the common pool.

DivideAndConquerForkJoinPoolJava
0 likes · 23 min read
Understanding the Fork/Join Framework and ForkJoinPool in Java
Open Source Linux
Open Source Linux
Jun 1, 2024 · Fundamentals

Mastering Concurrency and Parallelism in Java: From Basics to Advanced APIs

This article explains the concepts of concurrency, parallelism, and serial execution, describes common multi‑core scheduling algorithms, and demonstrates Java's concurrent programming tools—including Future, Fork/Join, Stream API, and CompletableFuture—through clear code examples and practical guidelines.

CompletableFutureFutureJava
0 likes · 20 min read
Mastering Concurrency and Parallelism in Java: From Basics to Advanced APIs
Baidu Geek Talk
Baidu Geek Talk
May 15, 2024 · Artificial Intelligence

Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM: Challenges, Techniques, and Optimizations

The talk outlines how Baidu’s Baige AIAK‑LLM suite tackles the exploding compute demands of trillion‑parameter models by boosting Model FLOPS Utilization through advanced parallelism, memory‑saving recompute, zero‑offload, adaptive scheduling, and cross‑chip orchestration, delivering 30‑60% training and inference speedups and a unified cloud product.

AI InfrastructureBaiduInference Optimization
0 likes · 25 min read
Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM: Challenges, Techniques, and Optimizations
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 15, 2024 · Artificial Intelligence

How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference

The article explains the scaling challenges of ever‑larger LLMs, introduces the MFU performance metric, surveys industry parallelism and memory‑saving techniques, and details Baidu’s AIAK‑LLM suite—including resource, component and acceleration layers—as well as concrete training and inference optimizations that raise MFU by 30‑60% and cut deployment costs.

AI InfrastructureLarge ModelMFU
0 likes · 25 min read
How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference
MaGe Linux Operations
MaGe Linux Operations
May 2, 2024 · Fundamentals

Unlock Python’s Power: Master Multiprocessing for Faster, Scalable Code

This comprehensive guide explains Python’s multiprocessing module, covering process creation, inter‑process communication, pools, synchronization primitives, error handling, and real‑world examples such as web crawlers, data analysis, and game servers, helping developers harness multiple CPU cores to boost performance and avoid GIL limitations.

Code ExamplesParallelismPython
0 likes · 32 min read
Unlock Python’s Power: Master Multiprocessing for Faster, Scalable Code
DataFunSummit
DataFunSummit
Apr 14, 2024 · Artificial Intelligence

TensorRT-LLM: NVIDIA’s Scalable LLM Inference Framework – Overview, Features, Workflow, Performance, and Future Directions

This article presents a comprehensive overview of NVIDIA’s TensorRT-LLM, detailing its product positioning as a scalable LLM inference solution, key features such as model support, low-precision and quantization techniques, parallelism strategies, the end-to-end usage workflow, performance highlights, future roadmap, and answers to common technical questions.

LLM inferenceNvidiaParallelism
0 likes · 13 min read
TensorRT-LLM: NVIDIA’s Scalable LLM Inference Framework – Overview, Features, Workflow, Performance, and Future Directions
Architect's Guide
Architect's Guide
Mar 22, 2024 · Backend Development

Understanding ForkJoinPool: Principles, Implementation, and Performance Evaluation in Java

This article explains the Fork/Join model and Java's ForkJoinPool, covering divide‑and‑conquer theory, custom RecursiveTask examples, pool construction options, task submission methods, work‑stealing mechanics, commonPool pitfalls, and performance testing results to help developers decide when to use it.

DivideAndConquerForkJoinPoolJava
0 likes · 22 min read
Understanding ForkJoinPool: Principles, Implementation, and Performance Evaluation in Java
Go Development Architecture Practice
Go Development Architecture Practice
Mar 21, 2024 · Backend Development

How to Process One Billion Rows in Go: 9 Optimized Solutions Under 4 Seconds

This article walks through nine Go‑based implementations for the 1‑Billion‑Row Challenge, starting from a straightforward scanner approach and progressively applying map pointer values, custom parsing, integer arithmetic, buffer tweaks, custom hash tables, and parallelism to shrink processing time from 1 minute 45 seconds to under 4 seconds.

1BRCBenchmarkGo
0 likes · 22 min read
How to Process One Billion Rows in Go: 9 Optimized Solutions Under 4 Seconds
Bilibili Tech
Bilibili Tech
Mar 15, 2024 · Artificial Intelligence

Hardware Resource Estimation and Bottleneck Analysis for Large Language Models (LLMs)

The article analyzes the compute, memory, and communication resources required to train and run large language models, quantifies bottlenecks such as the massive FLOP demand, terabyte‑scale GPU memory, and high‑bandwidth interconnect needs, and evaluates parallelism strategies and bandwidth estimates to guide hardware and software design for scaling LLMs.

AI InfrastructureHardwareLLM
0 likes · 53 min read
Hardware Resource Estimation and Bottleneck Analysis for Large Language Models (LLMs)
NewBeeNLP
NewBeeNLP
Feb 8, 2024 · Artificial Intelligence

How Speculative Decoding Supercharges Large Language Model Inference

This survey examines speculative decoding—a draft‑then‑verify technique that parallelizes token generation to cut LLM inference latency, outlines its core components, compares independent and self‑drafting methods, discusses verification strategies, and highlights open research challenges.

LLM inferenceParallelismPerformance Optimization
0 likes · 15 min read
How Speculative Decoding Supercharges Large Language Model Inference
DataFunTalk
DataFunTalk
Jan 31, 2024 · Artificial Intelligence

Introduction to NVIDIA TensorRT-LLM Inference Framework

TensorRT-LLM is NVIDIA's scalable inference framework for large language models that combines TensorRT compilation, fast kernels, multi‑GPU parallelism, low‑precision quantization, and a PyTorch‑like API to deliver high‑performance LLM serving with extensive customization and future‑focused enhancements.

GPU AccelerationLLM inferenceNvidia
0 likes · 12 min read
Introduction to NVIDIA TensorRT-LLM Inference Framework
JD Retail Technology
JD Retail Technology
Dec 19, 2023 · Fundamentals

Overview of CPU Architecture, Performance Trends, and Their Impact on Software Development

This article reviews recent decades of CPU performance improvements and semiconductor process advances, explains current CPU architectures, instruction set evolution, and how these trends influence software development practices, including parallelism, SIMD, multithreading, and power‑efficiency considerations.

CPU architectureInstruction SetParallelism
0 likes · 42 min read
Overview of CPU Architecture, Performance Trends, and Their Impact on Software Development
DataFunTalk
DataFunTalk
Dec 6, 2023 · Artificial Intelligence

Distributed Training Techniques and Quantitative Analysis for Large Language Models (GPT‑175B)

This article presents a comprehensive overview of state‑of‑the‑art distributed training methods for large language models, using GPT‑175B as a case study to analyze memory, communication, and compute overheads, and to recommend practical optimization strategies such as tensor, pipeline, and sequence parallelism, ZeRO‑1 optimizer, and selective activation checkpointing.

Distributed TrainingGPU memory optimizationLLM
0 likes · 22 min read
Distributed Training Techniques and Quantitative Analysis for Large Language Models (GPT‑175B)
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 12, 2023 · Artificial Intelligence

Why RTX 4090 Beats H100 for LLM Inference but Fails at Training

The article analyses the performance, memory, bandwidth and cost of NVIDIA H100, A100 and RTX 4090 GPUs, explains why the 4090 cannot handle large‑model training due to communication and memory limits, and shows how its high compute‑to‑price ratio makes it attractive for inference, backed by detailed parallelism calculations and cost‑per‑token estimates.

CostGPULLM
0 likes · 46 min read
Why RTX 4090 Beats H100 for LLM Inference but Fails at Training
Selected Java Interview Questions
Selected Java Interview Questions
Apr 11, 2023 · Fundamentals

Understanding the Differences Between Processes and Threads, Concurrency, and Shared Resources

This article explains the concepts of processes and threads, their fundamental differences, how they relate to concurrency and parallelism, and details which resources are private to each thread versus shared across a process, using diagrams and real‑world factory analogies to aid understanding.

Operating SystemParallelismThread
0 likes · 13 min read
Understanding the Differences Between Processes and Threads, Concurrency, and Shared Resources
MaGe Linux Operations
MaGe Linux Operations
Apr 4, 2023 · Backend Development

Boost Python Scripts with map() Parallelism: From Threads to ThreadPools

Python’s traditional multithreading tutorials often overcomplicate simple tasks, but by leveraging the built‑in map() function and the multiprocessing.dummy ThreadPool, developers can dramatically simplify and accelerate I/O‑bound and CPU‑bound scripts, reducing code from dozens of lines to just a few while achieving significant speedups.

MAPParallelismThreadPool
0 likes · 13 min read
Boost Python Scripts with map() Parallelism: From Threads to ThreadPools
DataFunSummit
DataFunSummit
Apr 2, 2023 · Artificial Intelligence

Efficient Training of Large Models with the Open‑Source Distributed Framework Easy Parallel Library (EPL)

This article introduces the challenges of scaling deep‑learning model training, explains the design and components of the open‑source Easy Parallel Library (EPL) that unifies data, pipeline, and operator‑split parallelism, and demonstrates its best‑practice results on large‑scale classification, BERT‑large, and massive multimodal models.

Distributed TrainingEPLLarge-Scale Training
0 likes · 15 min read
Efficient Training of Large Models with the Open‑Source Distributed Framework Easy Parallel Library (EPL)
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 23, 2023 · Artificial Intelligence

How Baidu’s Cloud Infrastructure Tackles the Challenges of Training Massive AI Models

This article explains how Baidu's intelligent cloud overcomes the compute and storage walls of large‑scale model training by combining hardware design, network topology, and software optimizations such as pipeline, tensor, and expert parallelism, cost‑model‑driven placement, and future‑proof AI infrastructure evolution.

AI InfrastructureBaidu CloudCost Model
0 likes · 28 min read
How Baidu’s Cloud Infrastructure Tackles the Challenges of Training Massive AI Models
FunTester
FunTester
Feb 21, 2023 · Backend Development

Mastering Java ForkJoinPool: A Hands‑On Guide to Parallel Task Execution

The article introduces Java's ForkJoinPool for dividing large, compute‑intensive tasks into smaller subtasks, explains its suitability for performance testing scenarios such as high‑throughput QPS/RT data collection, and provides a complete Groovy‑based demo that defines a RecursiveTask, implements the compute method, and runs a sum calculation using a thread pool.

ForkJoinPoolJavaParallelism
0 likes · 6 min read
Mastering Java ForkJoinPool: A Hands‑On Guide to Parallel Task Execution
Architects' Tech Alliance
Architects' Tech Alliance
Jan 30, 2023 · Operations

Advanced Software Performance Optimization Techniques: From Resource Exhaustion to Parallelism

This article presents a comprehensive guide to software performance optimization, covering low‑level resource exhaustion, horizontal scaling, sharding, lock‑free techniques, and system‑wide strategies, while offering practical examples and references for developers seeking to improve efficiency and scalability.

ParallelismResource ManagementScalability
0 likes · 12 min read
Advanced Software Performance Optimization Techniques: From Resource Exhaustion to Parallelism
DataFunSummit
DataFunSummit
Jan 5, 2023 · Artificial Intelligence

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

These notes explain how GPUs address the massive data, serial dependencies, and high computational complexity of modern AI by employing three acceleration strategies—parallelism, operator fusion, and simplification—illustrated with Megatron-LM, MoE models, and practical compression techniques such as quantization, distillation, and pruning.

AIGPUMegatron
0 likes · 16 min read
GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification
Laravel Tech Community
Laravel Tech Community
Jan 4, 2023 · Fundamentals

Understanding Processes, Threads, Concurrency, and Process Pools

This article explains the concepts of processes and threads, their differences and interactions, the states of a process, the distinctions between serial, concurrent, and parallel execution, and the purpose and operation of process pools in modern computing environments.

ParallelismProcess PoolThread
0 likes · 12 min read
Understanding Processes, Threads, Concurrency, and Process Pools
DataFunTalk
DataFunTalk
Jan 4, 2023 · Artificial Intelligence

GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification

This article explains how GPUs address the massive data, serial dependencies, and high computational complexity of modern AI by employing three acceleration strategies—parallelism, operator fusion, and simplification—detailing methods such as model, pipeline, and tensor parallelism, Megatron framework, MoE models, and various model compression techniques.

AIGPUMegatron
0 likes · 17 min read
GPU Acceleration Techniques for Large AI Models: Parallelism, Fusion, and Simplification
Laravel Tech Community
Laravel Tech Community
Nov 16, 2022 · Databases

DuckDB New Release Highlights and Feature Changes

The article introduces DuckDB, a high‑performance embedded analytical database, outlines its new release’s storage, performance, and memory improvements, describes its C/C++ integration and build process, and lists key feature changes such as parallel execution, novel compression methods, and enhanced SQL capabilities.

Analytical DatabaseDuckDBEmbedded Database
0 likes · 3 min read
DuckDB New Release Highlights and Feature Changes
Programmer DD
Programmer DD
Oct 8, 2022 · Fundamentals

Eight Timeless Computer Architecture Principles Every Designer Should Know

This article outlines eight enduring ideas—from designing for Moore's Law and using abstraction to speeding up common cases, leveraging parallelism, pipelining, prediction, memory hierarchy, and redundancy—that have shaped computer architecture over the past six decades.

Moore's LawParallelismPipeline
0 likes · 11 min read
Eight Timeless Computer Architecture Principles Every Designer Should Know
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 12, 2022 · Artificial Intelligence

How Whale Enables Efficient Giant Model Training on Heterogeneous GPUs

The article introduces Whale, an open‑source distributed training framework that unifies multiple parallelism strategies, uses hardware‑aware load balancing to accelerate giant models like BERT‑Large and the trillion‑parameter M6 on heterogeneous GPU clusters, and details its architecture, planning, and real‑world performance gains.

Deep LearningParallelismhardware-aware scheduling
0 likes · 11 min read
How Whale Enables Efficient Giant Model Training on Heterogeneous GPUs
Baidu Geek Talk
Baidu Geek Talk
Jul 6, 2022 · Artificial Intelligence

Why Training Massive AI Models Demands New Cluster Architectures and Parallelism Strategies

The article examines the industry trend toward ever‑larger AI models, compares their parameter scale to the human brain, outlines the computational and memory challenges of training such models, and details advanced parallelism techniques and Baidu's high‑performance cluster solutions that enable efficient, stable large‑scale model training.

AI InfrastructureBaiduCluster Computing
0 likes · 17 min read
Why Training Massive AI Models Demands New Cluster Architectures and Parallelism Strategies
Tencent Cloud Developer
Tencent Cloud Developer
Jun 2, 2022 · Fundamentals

A Detailed Explanation of Asynchronous Programming

The article explains asynchronous programming by contrasting concurrency, parallelism, and synchronization, illustrates how splitting serial work into independent async tasks can improve performance but introduces resource, locking, and state‑tracking challenges, and offers strategies such as careful task limits, locking, queues, and result monitoring.

ParallelismProgramming Conceptsasynchronous programming
0 likes · 23 min read
A Detailed Explanation of Asynchronous Programming
Code Ape Tech Column
Code Ape Tech Column
Apr 25, 2022 · Backend Development

Deep Dive into Java ForkJoinPool: Design, Implementation, and Usage

This article explains the divide‑and‑conquer principle, the internal design of Java's ForkJoinPool, its core classes (ForkJoinTask, ForkJoinWorkerThread, WorkQueue), key methods for task submission, work stealing, thread management, and provides practical code examples to illustrate how to implement and use fork/join parallelism effectively.

ForkJoinPoolJavaParallelism
0 likes · 48 min read
Deep Dive into Java ForkJoinPool: Design, Implementation, and Usage
Architecture Digest
Architecture Digest
Jan 24, 2022 · Fundamentals

Understanding Python Threads, Processes, GIL, and the multiprocessing & concurrent.futures Modules

This article explains the fundamental differences between threads and processes, the role of Python's Global Interpreter Lock, and provides a comprehensive guide to using the multiprocessing and concurrent.futures modules—including their main classes, synchronization primitives, and practical code examples—for effective concurrent programming in Python.

GILParallelismPython
0 likes · 40 min read
Understanding Python Threads, Processes, GIL, and the multiprocessing & concurrent.futures Modules
ByteDance Terminal Technology
ByteDance Terminal Technology
Dec 28, 2021 · Mobile Development

Analyzing Gradle’s Scheduling Mechanism to Optimize Android Component Publishing

This article investigates why large Android projects experience extremely slow AAR publishing, reveals that memory is not the main bottleneck, examines Gradle’s core scheduling, Worker API, lock contention, and measurement inaccuracies, and proposes disabling Worker API to achieve up to fifteen‑fold build speed improvements.

AndroidBuild PerformanceGradle
0 likes · 20 min read
Analyzing Gradle’s Scheduling Mechanism to Optimize Android Component Publishing
Code Ape Tech Column
Code Ape Tech Column
Dec 8, 2021 · Backend Development

Understanding Java 8 Stream API, Parallel Streams, and ForkJoinPool

This article explains Java 8 Stream API fundamentals, its composition, pipelining and internal iteration, details parallel stream execution using ForkJoinPool, discusses performance considerations, and provides practical code examples for creating and managing streams in backend development.

ForkJoinPoolJavaParallelism
0 likes · 19 min read
Understanding Java 8 Stream API, Parallel Streams, and ForkJoinPool
Architects' Tech Alliance
Architects' Tech Alliance
Jul 19, 2021 · Fundamentals

Understanding Processes and Threads: Definitions, Differences, Advantages, and Practical Usage

This article explains the fundamental concepts of processes and threads in operating systems, compares their characteristics, outlines their respective advantages and disadvantages, and provides practical guidelines for choosing between multi‑process and multi‑thread designs in real‑world applications.

Operating SystemParallelismThread
0 likes · 20 min read
Understanding Processes and Threads: Definitions, Differences, Advantages, and Practical Usage
Fulu Network R&D Team
Fulu Network R&D Team
Jul 6, 2021 · Operations

Understanding Throughput, Concurrency, and Lock Contention in System Design

Throughput measures the rate at which an application processes tasks, distinct from concurrency, and can be improved by reducing task latency, increasing parallelism, and optimizing lock usage through finer granularity, lower cost, and techniques like buffering, merging, and batch processing to mitigate contention and enhance scalability.

LocksParallelismScalability
0 likes · 11 min read
Understanding Throughput, Concurrency, and Lock Contention in System Design
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 4, 2021 · Big Data

Flink Performance Tuning Guide: Memory Configuration, Parallelism, Checkpoint Optimization, and Common Issues

This guide details comprehensive Flink performance tuning techniques, covering memory configuration, GC settings, parallelism adjustments, process parameters, partitioning strategies, Netty network tuning, checkpoint optimization, and common issues such as data skew and resource bottlenecks.

CheckpointFlinkMemory Management
0 likes · 18 min read
Flink Performance Tuning Guide: Memory Configuration, Parallelism, Checkpoint Optimization, and Common Issues
Python Programming Learning Circle
Python Programming Learning Circle
Apr 2, 2021 · Fundamentals

Effective Python Parallelism with Thread Pools and the map() Function

This article critiques traditional Python threading tutorials and demonstrates how to replace verbose thread‑pool code with concise map‑based parallelism using multiprocessing and multiprocessing.dummy, providing practical examples, performance measurements, and guidelines for choosing pool sizes for I/O‑ and CPU‑bound tasks.

MAPParallelismconcurrency
0 likes · 11 min read
Effective Python Parallelism with Thread Pools and the map() Function
Python Crawling & Data Mining
Python Crawling & Data Mining
Mar 7, 2021 · Fundamentals

Unlocking System Performance: How Amdahl’s Law and Parallelism Shape Modern Computing

This article explains how computer systems combine hardware and system software, describes the memory hierarchy, OS abstractions, Amdahl's law, and the three levels of parallelism—thread‑level, instruction‑level, and SIMD—showing why understanding these concepts is essential for writing fast, reliable programs.

Amdahl's LawMemory HierarchyParallelism
0 likes · 16 min read
Unlocking System Performance: How Amdahl’s Law and Parallelism Shape Modern Computing
Architects Research Society
Architects Research Society
Sep 2, 2020 · Databases

Scaling PostgreSQL for Multi‑Terabyte Databases: Indexes, Partitioning, Tablespaces, Parallelism, and Replication

This article explains how to extract maximum performance and scalability from PostgreSQL for multi‑terabyte workloads by leveraging specialized indexes, declarative partitioning, tablespaces, parallel query execution, read‑only replica load‑balancing, and foreign‑table sharding techniques.

ParallelismPartitioningPostgreSQL
0 likes · 10 min read
Scaling PostgreSQL for Multi‑Terabyte Databases: Indexes, Partitioning, Tablespaces, Parallelism, and Replication
MaGe Linux Operations
MaGe Linux Operations
Aug 17, 2020 · Fundamentals

Boost Python Performance: Simple Parallelism with map and ThreadPool

This article explains why traditional Python threading tutorials are often over‑engineered, introduces the concise map‑based parallelism using multiprocessing and multiprocessing.dummy, and demonstrates how a few lines of code can dramatically speed up I/O‑bound and CPU‑bound tasks.

MAPParallelismThreadPool
0 likes · 11 min read
Boost Python Performance: Simple Parallelism with map and ThreadPool
macrozheng
macrozheng
Feb 13, 2020 · Backend Development

How Fast Is Java Stream API? In‑Depth Performance Benchmarks Revealed

This article presents a comprehensive benchmark of Java's Stream API, comparing its serial and parallel performance against traditional loops across primitive, object, and reduction operations, and offers practical recommendations based on multi‑core versus single‑core results.

BenchmarkJavaParallelism
0 likes · 9 min read
How Fast Is Java Stream API? In‑Depth Performance Benchmarks Revealed
Programmer DD
Programmer DD
Nov 15, 2019 · Fundamentals

Why Concurrency Isn’t the Same as Parallelism: A Simple Analogy

This article explains the subtle difference between concurrency and parallelism using a ground‑hog and cart analogy, shows how task decomposition creates concurrent pipelines, and maps the model to scalable web‑service architecture, referencing Rob Pike’s talk “Concurrency is not Parallelism”.

GoParallelismWeb services
0 likes · 7 min read
Why Concurrency Isn’t the Same as Parallelism: A Simple Analogy
Alibaba Cloud Native
Alibaba Cloud Native
Oct 8, 2019 · Cloud Native

Mastering Kubernetes Jobs, CronJobs, and DaemonSets: Concepts, YAML, and Real‑World Walkthroughs

This guide walks Kubernetes beginners through the fundamentals of Job, CronJob, and DaemonSet controllers, explaining their use‑cases, key fields such as restartPolicy, backoffLimit, parallelism, and schedule, and provides step‑by‑step YAML examples and command‑line verification to illustrate parallel execution, scheduling, and update strategies.

ControllerCronJobDaemonSet
0 likes · 19 min read
Mastering Kubernetes Jobs, CronJobs, and DaemonSets: Concepts, YAML, and Real‑World Walkthroughs
Architect's Tech Stack
Architect's Tech Stack
Jul 14, 2019 · Backend Development

Performance Evaluation of Java 8 Stream API: Benchmarks and Insights

This article presents a comprehensive benchmark of Java 8 Stream API on large‑scale data, comparing serial and parallel stream operations with traditional external iteration across primitive, object, and reduction workloads, and draws practical recommendations on when to use streams for optimal performance.

BenchmarkJVMParallelism
0 likes · 8 min read
Performance Evaluation of Java 8 Stream API: Benchmarks and Insights
MaGe Linux Operations
MaGe Linux Operations
Jul 12, 2019 · Fundamentals

Boost Python Performance: 24 Proven Techniques to Speed Up Code

This guide presents 24 practical methods—including timing measurements, faster data structures, loop optimizations, vectorization, and parallel processing—to dramatically accelerate Python code, each illustrated with clear before‑and‑after performance screenshots.

BenchmarkingParallelismProfiling
0 likes · 7 min read
Boost Python Performance: 24 Proven Techniques to Speed Up Code
MaGe Linux Operations
MaGe Linux Operations
Apr 1, 2019 · Backend Development

Boost Python Performance: Parallelize Tasks with ThreadPool and map

This article critiques traditional Python multithreading tutorials and demonstrates how the built‑in map function together with multiprocessing.dummy's ThreadPool can dramatically speed up I/O‑bound and CPU‑bound tasks, offering concise code examples, performance benchmarks, and a real‑world thumbnail generation case study.

MAPParallelismThreadPool
0 likes · 11 min read
Boost Python Performance: Parallelize Tasks with ThreadPool and map