Tagged articles

parallel computing

78 articles · Page 1 of 1

May 12, 2026 · Fundamentals

Every Line of Code Echoes von Neumann’s 80‑Year‑Old Shortcut

The article explains how John von Neumann’s 1945 decision to store programs in memory created the universal von Neumann architecture, why this simple design outlasted more optimal alternatives, and how his ideas also spawned parallel computing and game theory, shaping modern computers, AI, and distributed systems.

Computer ArchitectureGame Theoryparallel computing

0 likes · 7 min read

Every Line of Code Echoes von Neumann’s 80‑Year‑Old Shortcut

TonyBai

Apr 17, 2026 · Industry Insights

The 30‑Year Journey: From Parallel Computing to Modern GPU‑Powered AI

This article traces three decades of government‑funded research in parallel computing, graphics systems, and stream processing, showing how those advances migrated to companies like Nvidia, evolved into CUDA and other GPU technologies, and ultimately enabled today’s AI revolution.

AICUDAGPU computing

0 likes · 18 min read

The 30‑Year Journey: From Parallel Computing to Modern GPU‑Powered AI

ITPUB

Apr 10, 2026 · Backend Development

How a Simple Refactor and Parallelism Cut Java Loop Time from 26s to 0.7s

A new team member transformed a painfully slow Java data‑processing routine—originally taking 26,856 ms—by refactoring nested loops, extracting repeated calculations, and introducing a thread‑pool for parallel execution, reducing runtime to just 748 ms, and the article walks through the before‑and‑after code and key techniques.

JavaPerformance Optimizationparallel computing

0 likes · 8 min read

How a Simple Refactor and Parallelism Cut Java Loop Time from 26s to 0.7s

Machine Heart

Apr 1, 2026 · Artificial Intelligence

SSD Framework Doubles Inference Speed Over Top Engines, Breaking the Serial Bottleneck

The SSD framework and its SAGUARO optimization, developed by researchers from Stanford, Princeton, and Together AI, parallelize drafting and verification in speculative decoding, eliminating serial dependencies and achieving up to 2× faster inference than the world’s strongest engines and up to 5× speedup over standard autoregressive generation, while addressing challenges such as prediction accuracy, acceptance‑rate trade‑offs, and fallback strategies.

SAGUAROSSDinference acceleration

0 likes · 7 min read

SSD Framework Doubles Inference Speed Over Top Engines, Breaking the Serial Bottleneck

AI Cyberspace

Nov 19, 2025 · Artificial Intelligence

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

This article explains how AI model training has evolved from single‑GPU workloads to massive distributed training using MPI for CPU‑centric communication and NCCL for GPU‑centric communication, covering their histories, core concepts, programming interfaces, topology discovery, protocol choices, and performance testing on multi‑GPU clusters.

AI distributed trainingGPU communicationHigh-performance computing

0 likes · 71 min read

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

IT Services Circle

Nov 9, 2025 · Fundamentals

Why Nvidia’s GPUs Are the Secret Key to the Quantum Computing Era

Nvidia leverages its GPUs to solve quantum computers' fragile error‑correction problem, introducing ultra‑fast NVQLink interconnect and the CUDA‑Q programming platform, creating a feedback loop that secures its dominance in both traditional and emerging quantum markets.

CUDA-QGPUNVQLink

0 likes · 6 min read

Why Nvidia’s GPUs Are the Secret Key to the Quantum Computing Era

Data STUDIO

Nov 6, 2025 · Big Data

Ditch Multithreading: 11 Python Libraries That Deliver Lightning‑Fast Performance

This article reviews eleven high‑performance Python libraries—Polars, Numba, orjson, PyO3, Blosc, Awkward Array, Dask, Vaex, Modin, scikit‑learn‑intelex, uvloop and PyPy—showing how they achieve multi‑fold speedups through Rust, JIT, SIMD, lazy evaluation and parallel execution, and offers guidance on when to choose each tool.

Pythondaskdata processing

0 likes · 14 min read

Ditch Multithreading: 11 Python Libraries That Deliver Lightning‑Fast Performance

Tencent Cloud Developer

Sep 26, 2025 · Fundamentals

Why GPUs Really Matter: From Architecture Basics to CUDA Programming

This article explains why GPUs have become the preferred platform for high‑performance computing, covering Dennard scaling, GPU speed advantages, theoretical FLOPS calculations, CUDA programming examples like SAXPY, the SIMT execution model, instruction pipelines, and modern techniques for handling branch divergence and register bank conflicts.

CUDA programmingGPU architectureGPU performance

0 likes · 38 min read

Why GPUs Really Matter: From Architecture Basics to CUDA Programming

Refining Core Development Skills

Aug 7, 2025 · Fundamentals

Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture

This article explains how NVIDIA transitioned from gaming graphics cards to general‑purpose GPUs with the first data‑center Tesla GPU, detailing the unified shader architecture, the internal components of TPCs and SMs, CUDA 1.0 programming basics, and performance calculations that illustrate the massive computational advantage over contemporary CPUs.

CUDAGPGPUGPU architecture

0 likes · 23 min read

Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture

Code Mala Tang

Jul 13, 2025 · Backend Development

Boost Python Computation Speed 30× with Rust and pyo3

This tutorial demonstrates how to accelerate Python's compute‑intensive tasks by rewriting critical functions in Rust, using the pyo3 library to create Python extension modules, and compares single‑thread, multithreaded, and multiprocessing performance on Linux.

extension moduleparallel computingpyo3

0 likes · 10 min read

Boost Python Computation Speed 30× with Rust and pyo3

Tencent Technical Engineering

Jul 8, 2025 · Artificial Intelligence

Why GPUs Power Large‑Model Inference: From Graphics to GPGPU

This article explains how modern GPUs evolved from graphics rendering to general‑purpose computing, details the CPU‑GPU heterogenous architecture, walks through a CUDA demo that adds two billion‑element arrays, compares CPU and GPU performance, and describes the compilation, loading, and execution pipeline of CUDA kernels.

AI inferenceCUDAGPGPU

0 likes · 33 min read

Why GPUs Power Large‑Model Inference: From Graphics to GPGPU

Architects' Tech Alliance

Jun 19, 2025 · Fundamentals

Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained

This comprehensive guide covers 100 essential GPU fundamentals, from basic definitions and architecture to core technologies, performance optimization, emerging trends, and industry developments, providing a complete technical foundation for graphics, AI, and high‑performance computing applications.

Computer ArchitectureGPUGraphics Processing Unit

0 likes · 19 min read

Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained

Architects' Tech Alliance

Jun 15, 2025 · Fundamentals

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

This comprehensive guide covers GPU definitions, evolution, core components, architectural designs, performance metrics, programming models, deep‑learning applications, comparisons with other processors, practical use cases, optimization techniques, and future trends, providing a solid foundation for anyone interested in modern graphics and compute acceleration.

Computer ArchitectureGPUHardware

0 likes · 43 min read

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

php Courses

Apr 22, 2025 · Fundamentals

Comprehensive Guide to Python Multiprocessing: Basics, IPC, Process Pools, and Best Practices

This article provides an in‑depth overview of Python’s multiprocessing module, covering its fundamentals, process creation, inter‑process communication methods such as Queue, Pipe, shared memory, process pools, synchronization techniques, and practical best‑practice guidelines for effective parallel programming.

IPCMultiprocessingProcess Pool

0 likes · 10 min read

Comprehensive Guide to Python Multiprocessing: Basics, IPC, Process Pools, and Best Practices

Open Source Linux

Apr 8, 2025 · Artificial Intelligence

A Turing‑Award Legend on AI, Parallel Computing, and Learning's Future

In this candid interview, 83‑year‑old Turing‑Award winner Jeffrey Ullman reflects on his decades‑long impact on compilers, databases, and algorithms, discusses the unpredictable nature of technological revolutions, explores the rise of large language models, parallel computing, prompt engineering, and the challenges of adapting education and software engineering to rapid AI‑driven change.

Education TechnologyPrompt Engineeringartificial-intelligence

0 likes · 23 min read

A Turing‑Award Legend on AI, Parallel Computing, and Learning's Future

Architects' Tech Alliance

Apr 3, 2025 · Artificial Intelligence

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

Training today's massive AI foundation models demands extensive GPU resources and sophisticated multi‑GPU communication, making technologies like NVLink and NVSwitch crucial for efficient distributed training, while data‑parallel and model‑parallel strategies together optimize performance across large‑scale hardware clusters.

AIGPUNVLink

0 likes · 8 min read

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

Tencent Technical Engineering

Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming

0 likes · 42 min read

Fundamentals of GPU Architecture and Programming

AI Product Manager Community

Feb 28, 2025 · Artificial Intelligence

What’s Inside DeepSeek’s Open‑Source Week? DualPipe, EPLB, 3FS and More Explained

DeepSeek’s recent Open‑Source Week unveiled a suite of AI‑focused tools—including the DualPipe pipeline parallelism algorithm, the EPLB expert load balancer, detailed training‑inference framework data, the high‑performance 3FS parallel file system, and the Smallpond data‑processing framework—each with GitHub links and performance highlights.

AIFile Systemdistributed training

0 likes · 7 min read

What’s Inside DeepSeek’s Open‑Source Week? DualPipe, EPLB, 3FS and More Explained

AI Cyberspace

Feb 5, 2025 · Fundamentals

From 2D Cards to AI Powerhouses: The Evolution of GPUs

This article traces the GPU's journey from early 2D graphics cards to modern GPGPUs powering AI and HPC, explains core hardware components, compares GPU and CPU architectures, and details the 3D rendering pipeline that underlies graphics and parallel computation.

Computer ArchitectureGPUGraphics Processing Unit

0 likes · 10 min read

From 2D Cards to AI Powerhouses: The Evolution of GPUs

Infra Learning Club

Jan 31, 2025 · Fundamentals

Essential CUDA Learning Guide: Basics, Compilation, and Profiling

This article walks through a practical APOD workflow for CUDA development—assessing bottlenecks, parallelizing with cuBLAS/cuFFT/Thrust, optimizing iteratively, and deploying—while covering nvcc compilation flags, PTX virtual ISA, nvprof profiling, core terminology (SP, SM, warp, grid, block, thread), indexing patterns, and unified memory references.

CUDACUDA terminologyGPU programming

0 likes · 8 min read

Essential CUDA Learning Guide: Basics, Compilation, and Profiling

Test Development Learning Exchange

Jan 22, 2025 · Artificial Intelligence

Comprehensive Guide to Python Data Science Libraries with Code Examples

This article presents a concise tutorial on essential Python data science libraries, covering data cleaning with Pandas, numerical analysis with NumPy and SciPy, visualization with Matplotlib and Seaborn, machine learning with scikit‑learn, NLP with NLTK and spaCy, time‑series modeling, image processing, database access, and parallel computing, each illustrated with ready‑to‑run code examples.

Data VisualizationNLPPython

0 likes · 7 min read

Comprehensive Guide to Python Data Science Libraries with Code Examples

Baidu Tech Salon

Jan 8, 2025 · Artificial Intelligence

Evolution of Video Search Ranking Architecture Toward an End‑to‑End Large‑Model Framework

The paper describes transforming a tightly coupled, multi‑stage video search ranking pipeline into a modular, end‑to‑end large‑model architecture that decouples recall, employs a graph‑engine parallel framework and elastic compute allocation, thereby boosting performance, flexibility, personalization and lowering long‑term operational costs.

End-to-EndSystem Optimizationelastic resources

0 likes · 10 min read

Evolution of Video Search Ranking Architecture Toward an End‑to‑End Large‑Model Framework

Baidu Geek Talk

Jan 8, 2025 · Artificial Intelligence

Evolution of Video Search Ranking Architecture Towards an End‑to‑End Large‑Model Framework

The article outlines how video search ranking has shifted from a tightly‑coupled multi‑stage cascade to an extensible, end‑to‑end, model‑centric framework called Rankflow, leveraging large‑model inference, decoupled recall, fine‑grained parallelism, and elastic compute allocation to boost performance, flexibility, and maintainability while paving the way for future retrieval‑augmented generation integration.

AIelastic resourceslarge models

0 likes · 11 min read

Evolution of Video Search Ranking Architecture Towards an End‑to‑End Large‑Model Framework

Baobao Algorithm Notes

Nov 7, 2024 · Artificial Intelligence

Demystifying FlashAttention: A Minimalist Derivation of the Algorithm

This article presents a concise, step‑by‑step derivation of FlashAttention, explaining the prerequisite linear‑algebra concepts, the softmax simplifications, and the parallel computation workflow—including the LSE‑enhanced version—so readers can grasp the algorithm’s elegance without heavy mathematics.

Algorithm DerivationAttention MechanismFlashAttention

0 likes · 8 min read

Demystifying FlashAttention: A Minimalist Derivation of the Algorithm

Top Architect

Nov 5, 2024 · Backend Development

Understanding ForkJoinPool: Principles, Implementation, and Performance Evaluation

This article explains the Fork/Join model and Java's ForkJoinPool, covering divide‑and‑conquer theory, task splitting, core APIs, code examples, common pitfalls, performance testing, and best‑practice recommendations for high‑concurrency computing.

ForkJoinPoolJava concurrencyThreadPool

0 likes · 25 min read

Understanding ForkJoinPool: Principles, Implementation, and Performance Evaluation

Tencent Cloud Developer

Nov 1, 2024 · Databases

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Tencent Cloud's TDSQL shattered world records in both TPC‑DS (OLAP) and TPC‑C (OLTP) benchmarks, achieving a 7260 M QphDS score at a cost of 37.52 CNY/kQphDS, and the article explains the three self‑developed technologies—MPP execution, parallel execution framework, and columnar‑vectorized engine—that made this performance possible.

Columnar StorageDatabase PerformanceMPP

0 likes · 7 min read

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

AsiaInfo Technology: New Tech Exploration

Oct 23, 2024 · Artificial Intelligence

How to Optimize Distributed Training for Massive AI Models: Strategies & Performance Insights

This article examines the challenges of scaling large AI models across multiple GPUs, explores data, pipeline, and tensor parallelism, analyzes collective communication patterns and data‑channel technologies such as PCIe, NVLink and RDMA, and offers concrete optimization recommendations to boost training efficiency.

GPU communicationcollective communicationdistributed training

0 likes · 21 min read

How to Optimize Distributed Training for Massive AI Models: Strategies & Performance Insights

DaTaobao Tech

Sep 6, 2024 · Backend Development

Go Language Coroutine Principles and GMP Model Implementation

The article examines Go's coroutine architecture and its GMP (Goroutine-Machine-Processor) model, tracing coroutine history, comparing kernel, user, and hybrid thread models, and detailing G, M, and P components, scheduling principles, work-stealing, preemption, and runtime implementation that give Go high concurrency and low latency.

CoroutinesGMP modelGo language

0 likes · 37 min read

Go Language Coroutine Principles and GMP Model Implementation

Architects' Tech Alliance

Sep 4, 2024 · Fundamentals

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

The article explains OpenAI's 2020 Scaling Laws that show larger transformer models, more data, and greater compute consistently improve performance, introduces the concept of emergent abilities at critical size thresholds, and outlines the core principles of parallel computing such as multi‑processor usage, task decomposition, concurrent execution, and inter‑processor communication.

Task Decompositioncommunicationconcurrency

0 likes · 6 min read

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

Open Source Linux

Jul 2, 2024 · Fundamentals

Why GPUs Power AI and Gaming: A Beginner’s Guide to Their Architecture

This article explains what a GPU is, how it differs from a CPU, its internal architecture, and why its massive parallel processing makes it essential for graphics rendering, scientific computation, and AI inference, illustrated with examples such as NVIDIA RTX 3090.

AI inferenceGPUGraphics Rendering

0 likes · 8 min read

Why GPUs Power AI and Gaming: A Beginner’s Guide to Their Architecture

Architects' Tech Alliance

May 14, 2024 · Artificial Intelligence

Why GPUs Are Essential for Modern Artificial Intelligence and How They Compare with CPUs, ASICs, and FPGAs

This article explains the pivotal role of GPUs in today’s generative AI era, describes their architecture and applications, compares them with CPUs, ASICs, and FPGAs, and offers guidance on selecting the right processor for AI workloads while also noting related reference resources.

GPUHardwareProcessor Comparison

0 likes · 12 min read

Why GPUs Are Essential for Modern Artificial Intelligence and How They Compare with CPUs, ASICs, and FPGAs

Java Captain

Feb 26, 2024 · Fundamentals

Principles, Advantages, Challenges, and Future of Multithreaded Architecture

This article examines multithreaded architecture, explaining its core principles, highlighting efficiency and resource utilization benefits, discussing synchronization, communication, and management challenges along with mitigation strategies, and exploring its future impact on cloud computing, big data, AI, and overall system performance.

parallel computingsoftware architecture

0 likes · 5 min read

Principles, Advantages, Challenges, and Future of Multithreaded Architecture

Test Development Learning Exchange

Dec 6, 2023 · Backend Development

Using Python multiprocessing and Celery for Parallel and Distributed Task Processing

This article introduces Python's multiprocessing module and the Celery task queue, explains their core concepts, and provides practical code examples for multi‑process parallel computation, inter‑process communication, asynchronous execution, scheduled jobs, result callbacks, retries, and distributed task orchestration.

Distributed TasksMultiprocessingPython

0 likes · 7 min read

Using Python multiprocessing and Celery for Parallel and Distributed Task Processing

Test Development Learning Exchange

Aug 2, 2023 · Backend Development

Practical Examples of Python multiprocessing and Celery for Parallel and Distributed Task Processing

This article introduces Python's multiprocessing module and the Celery distributed task queue, explains their core features, and provides ten practical code examples demonstrating multi‑process parallel computation, inter‑process communication, asynchronous tasks, scheduling, retries, and distributed processing for real‑world applications.

Distributed TasksMultiprocessingPython

0 likes · 7 min read

Practical Examples of Python multiprocessing and Celery for Parallel and Distributed Task Processing

Python Programming Learning Circle

Jun 17, 2023 · Big Data

Accelerating Python Data Preprocessing with Multiprocessing in Three Lines of Code

This article demonstrates how to use Python's concurrent.futures module to parallelize image resizing, turning a single‑process script into a multi‑core solution with just three additional lines of code, achieving up to a six‑fold speed‑up on typical CPUs.

Data preprocessingPythonconcurrent.futures

0 likes · 7 min read

Accelerating Python Data Preprocessing with Multiprocessing in Three Lines of Code

Open Source Linux

May 29, 2023 · Fundamentals

What Is a GPU? Understanding Its Role in Graphics, AI, and Computing

This article explains what a GPU (Graphics Processing Unit) is, how it differs from a CPU, its architecture and performance characteristics, and why it powers everything from real‑time rendering to AI inference, using examples like the NVIDIA RTX 3090.

CPU comparisonGPUGraphics Processing Unit

0 likes · 8 min read

What Is a GPU? Understanding Its Role in Graphics, AI, and Computing

Baidu Geek Talk

May 10, 2023 · Artificial Intelligence

Baidu's AI Infrastructure for Large-Scale LLM Training: Architecture, Challenges, and Optimization

Baidu’s AI infrastructure combines a massive InfiniBand‑linked GPU cluster, Kunlun chips, the PaddlePaddle framework, and the Wenxin model suite with 4D hybrid parallelism, elastic fault tolerance, and a two‑stage training pipeline to overcome computation, memory, and communication walls, delivering world‑leading MLPerf performance for large‑scale LLMs.

GPU ClusterInfiniBandLarge Language Model

0 likes · 15 min read

Baidu's AI Infrastructure for Large-Scale LLM Training: Architecture, Challenges, and Optimization

MaGe Linux Operations

Apr 22, 2023 · Fundamentals

Boost Python Performance: Master Multiprocessing with Real Code Examples

This article explains Python's multiprocessing module, shows how to create and manage processes and process pools with clear code samples, and outlines common use cases, best practices, and pitfalls for efficiently handling CPU‑bound tasks.

MultiprocessingProcess Poolconcurrency

0 likes · 7 min read

Boost Python Performance: Master Multiprocessing with Real Code Examples

Architects' Tech Alliance

Apr 17, 2023 · Fundamentals

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

This article provides a comprehensive overview of high‑performance computing, covering system architectures, hardware components, performance metrics, network topologies, common parallel file systems, cluster management functions, mainstream job‑scheduling systems, and MPI‑based parallel programming models.

HPCHigh-performance computingJob Scheduling

0 likes · 14 min read

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

DataFunTalk

Feb 18, 2023 · Artificial Intelligence

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights

This article details Du Xiaoman's development of the ATLAS automated machine learning platform, covering business scenarios, AI algorithm deployment challenges, the end‑to‑end production workflow, platform components such as annotation, data, training and deployment, as well as optimization techniques like AutoML, meta‑learning, NAS, and large‑scale parallelism, concluding with lessons learned and future directions.

AI DeploymentAutoMLData Engineering

0 likes · 20 min read

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights

DeWu Technology

Jan 6, 2023 · Backend Development

Coding Standards and Best Practices for Backend Development

The guide defines backend coding standards and best practices, specifying clear naming conventions for classes, methods, variables and constants, enforcing consistent code style via .editorconfig, recommending unchecked exceptions for simpler error handling, promoting structured asynchronous logging with trace IDs, and outlining performance optimizations such as loop refinement, CompletableFuture concurrency, and proper resource management to prevent memory leaks.

Backend DevelopmentLoggingbest practices

0 likes · 16 min read

Coding Standards and Best Practices for Backend Development

Tencent Cloud Developer

Sep 30, 2022 · Cloud Computing

Understanding GPU Computing and Cloud-Based GPU Solutions

The article explains how massive parallel pixel calculations demand GPUs, whose high cost and inflexibility are solved by Tencent Cloud’s elastic, virtualized GPU services—including vGPU, qGPU, TACO abstraction, and spot instances—delivering up to 16 EFLOPS for AI, scientific, graphics, and video workloads.

GPU computingTencent Cloudcloud GPU

0 likes · 5 min read

Understanding GPU Computing and Cloud-Based GPU Solutions

Model Perspective

Aug 16, 2022 · Fundamentals

Boost Python Speed Instantly with Numba: A Practical Guide

Numba is a Python just‑in‑time compiler that transforms functions into fast native machine code, enabling near C‑level performance without rewriting code; by adding simple decorators like @jit or @njit, you can accelerate loops, NumPy operations, and even leverage parallel or GPU execution.

JIT CompilationPerformance OptimizationPython

0 likes · 7 min read

Boost Python Speed Instantly with Numba: A Practical Guide

Top Architect

Jun 18, 2022 · Big Data

Overview of Data Lakes and the Open SPL Compute Engine

This article explains the concept and challenges of data lakes, describes the “impossible triangle” of storage, compute, and cost, and introduces the open‑source SPL engine that provides multi‑source, file‑based, high‑performance computing to overcome those limitations.

Data LakeSPLcompute engine

0 likes · 13 min read

Overview of Data Lakes and the Open SPL Compute Engine

Architects' Tech Alliance

May 3, 2022 · Fundamentals

High‑Performance Computing Overview and Resource Guide

This article provides a comprehensive overview of high‑performance computing (HPC), covering its definition, hardware architectures, performance metrics, cluster components, parallel file systems, management and scheduling tools, as well as common MPI implementations and links to further technical resources.

FLOPSFile SystemsHPC

0 likes · 11 min read

High‑Performance Computing Overview and Resource Guide

IT Services Circle

Apr 4, 2022 · Fundamentals

From Simple Loops to SIMD: The Evolution of Parallel Computation in CPU Design

The article narrates a CPU's journey from a naïve element‑wise increment loop to the adoption of SIMD, MMX, SSE, and AVX instruction sets, illustrating the motivations, challenges, and architectural decisions behind parallelizing integer and floating‑point operations.

CPU architectureInstruction SetMMX

0 likes · 8 min read

From Simple Loops to SIMD: The Evolution of Parallel Computation in CPU Design

Python Programming Learning Circle

Jan 27, 2022 · Big Data

Using ipyparallel for Parallel and Distributed Computing in Python

This article explains how to overcome Python's Global Interpreter Lock by installing ipyparallel, configuring parallel profiles, and using engines, DirectView, and LoadBalancedView to run both synchronous and asynchronous tasks, with code examples and performance comparisons.

Distributed ComputingPythonipyparallel

0 likes · 9 min read

Using ipyparallel for Parallel and Distributed Computing in Python

21CTO

Oct 25, 2021 · Fundamentals

Why Understanding Computer System Fundamentals Boosts Your Programming Performance

This article explains how computer systems combine hardware and system software, describes the memory hierarchy, operating‑system abstractions, Amdahl's law, and various forms of parallelism, and shows why mastering these fundamentals can dramatically improve program efficiency and reliability.

Amdahl's LawComputer ArchitecturePerformance Optimization

0 likes · 16 min read

Why Understanding Computer System Fundamentals Boosts Your Programming Performance

Architects' Tech Alliance

Sep 23, 2021 · Fundamentals

Understanding High‑Performance Computing (HPC): Principles, Architecture, and Applications

The article explains high‑performance computing (HPC) concepts, including serial and parallel processing, supercomputer performance measured in FLOPS, real‑world scientific applications such as drug discovery and weather forecasting, and the hardware architectures that enable these massive computational capabilities.

Computer ArchitectureFLOPSGPU

0 likes · 7 min read

Understanding High‑Performance Computing (HPC): Principles, Architecture, and Applications

Tencent Cloud Developer

Aug 17, 2021 · Backend Development

Design and Implementation of a Calculation DSL and Engine

The article presents a domain‑specific language that mimics Excel formulas, a stack‑based parser and recursive engine for evaluating calculations, and a multi‑layer architecture—including a dynamic priority scheduler—to efficiently resolve field dependencies, improve maintainability, and enable monitoring across large data systems.

Backend DevelopmentCalculation EngineMonitoring

0 likes · 11 min read

Design and Implementation of a Calculation DSL and Engine

Tech Musings

Jul 8, 2021 · Big Data

Building a Simple Single-Node MapReduce System: From Theory to Code

This article walks through implementing a lightweight single‑machine MapReduce framework inspired by the original MapReduce paper, covering the abstract Map/Reduce model, task scheduling between master and workers, core Go code for map, reduce, worker, and coordinator, and a brief reflection on its limitations.

Big DataLabMapReduce

0 likes · 10 min read

Building a Simple Single-Node MapReduce System: From Theory to Code

Python Programming Learning Circle

Mar 8, 2021 · Operations

Using IPython and Jupyter for Multi‑language and Parallel Computing

This article explains how IPython and Jupyter notebooks support multi‑language execution, integrate Fortran via F2PY, and enable parallel and distributed computing with ipyparallel, illustrating practical magic commands, cluster setup, and performance considerations for scientific Python workflows.

IPythonJupyterMagic Commands

0 likes · 12 min read

Using IPython and Jupyter for Multi‑language and Parallel Computing

JavaEdge

Feb 11, 2021 · Fundamentals

From Vacuum Tubes to Quantum Chips: How Computer Architecture Evolved

This article traces the historical evolution of computers from first‑generation vacuum‑tube machines to modern VLSI systems, explains fundamental performance limits such as the von Neumann bottleneck and power consumption, and introduces parallel optimization concepts like Amdahl’s Law and future computing paradigms.

Amdahl's LawCPU performanceComputer Architecture

0 likes · 12 min read

From Vacuum Tubes to Quantum Chips: How Computer Architecture Evolved

Java Architect Essentials

Oct 18, 2020 · Backend Development

Performance Evaluation of Java Stream API: Serial vs Parallel Execution

This article presents a series of Java Stream API performance experiments—comparing serial and parallel streams on primitive, object, and reduction tasks—showing that while serial streams are slower than traditional loops for simple operations, parallel streams can significantly outperform both in multi‑core environments.

Backend DevelopmentStream APIparallel computing

0 likes · 7 min read

Performance Evaluation of Java Stream API: Serial vs Parallel Execution

Architects' Tech Alliance

Aug 19, 2020 · Fundamentals

Understanding Supercomputers: Definitions, Evaluation Systems, Research Value, and Technological Layers

This article explains what supercomputers are, outlines major ranking systems such as TOP500 and Green500, describes their wide‑range research and application value, and details the three‑layer architecture, parallel computing principles, and emerging trends like exascale, AI integration, quantum and bio‑computing.

ApplicationsExascaleTOP500

0 likes · 4 min read

Understanding Supercomputers: Definitions, Evaluation Systems, Research Value, and Technological Layers

TAL Education Technology

May 14, 2020 · Artificial Intelligence

An Introduction to GPU Computing and CUDA Architecture

This article provides a concise overview of GPU computing fundamentals, covering GPU hardware components, memory hierarchy, parallel execution models, and the CUDA programming framework, illustrating how CPUs and GPUs cooperate in heterogeneous computing environments.

CUDACUDA programmingGPU

0 likes · 16 min read

An Introduction to GPU Computing and CUDA Architecture

Python Programming Learning Circle

Apr 16, 2020 · Big Data

Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations

This tutorial demonstrates how to initialize a SparkContext in PySpark, perform simple parallel computations such as temperature conversion and reduction, create a SparkSession to read CSV data, and apply common DataFrame operations like selecting columns, adding new columns, filtering, grouping, and aggregating.

Big DataPySparkSpark

0 likes · 5 min read

Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations

Alibaba Cloud Developer

Apr 16, 2020 · Artificial Intelligence

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

This article explains how the Mars framework enables parallel and distributed execution of core Python data‑science libraries—Numpy, Pandas, and Scikit‑Learn—while integrating with RAPIDS for GPU acceleration, and demonstrates its performance advantages through code examples and benchmark results.

GPU AccelerationMarsNumPy

0 likes · 16 min read

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

Architects' Tech Alliance

Feb 6, 2020 · Fundamentals

Parallel Computing vs Distributed Computing: Concepts, Principles, and Differences

The article explains the concepts, principles, advantages, and key differences between parallel computing and distributed computing, highlighting their roles within high‑performance computing and when each approach is most appropriate.

HPCcomputing fundamentalsparallel computing

0 likes · 6 min read

Parallel Computing vs Distributed Computing: Concepts, Principles, and Differences

dbaplus Community

Oct 13, 2019 · Databases

Why PostgreSQL Stands Out: Community, Innovation, and Cloud Power

The talk explores PostgreSQL’s unique community-driven development, its commercial and innovative capabilities, new version features such as enhanced partitioning and parallel computing, and how its open‑source licensing, extensible architecture, and cloud integrations make it a compelling, enterprise‑ready database solution.

Database FeaturesOpen Source DatabasePostgreSQL

0 likes · 17 min read

Why PostgreSQL Stands Out: Community, Innovation, and Cloud Power

Architects' Tech Alliance

Oct 12, 2019 · Fundamentals

Understanding GPUs: History, Architecture, and Acceleration Technologies (CUDA & OpenCL)

This article explains the history, architecture, and operation of GPUs, and introduces major acceleration frameworks such as CUDA and OpenCL, highlighting their roles in parallel computing and modern graphics processing for scientific and AI workloads.

CUDAComputer ArchitectureGPU

0 likes · 13 min read

Understanding GPUs: History, Architecture, and Acceleration Technologies (CUDA & OpenCL)

Architects' Tech Alliance

Sep 20, 2019 · Industry Insights

Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing

The article explains how heterogeneous parallel computing—distributing tasks across CPUs, GPUs, FPGAs and other accelerators—has become essential after Moore’s law plateau, detailing its principles, hardware and software perspectives, classification of architectures, processing stages, user‑guided versus compiler‑guided methods, and its relevance to AI, cloud and industry workloads.

CPUFPGAGPU

0 likes · 15 min read

Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing

Architects' Tech Alliance

Sep 5, 2019 · Fundamentals

GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)

This article explains the history and origin of GPUs, compares CPU and GPU architectures, describes the GPU processing pipeline, and introduces acceleration technologies such as CUDA and OpenCL, highlighting their programming models, supported languages, and key performance metrics.

CUDAGPUGraphics Processing

0 likes · 14 min read

GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)

Architects' Tech Alliance

Jul 1, 2019 · Fundamentals

Understanding Supercomputers: Architecture, Performance, and Real‑World Applications

The article explains the latest TOP500 supercomputer rankings, emphasizes that architecture—not just CPU count—is the core technology behind high‑performance computing, describes the challenges of networking, software, and power, and illustrates diverse applications such as nuclear simulation, climate forecasting, and video rendering.

Computer ArchitectureHPC Applicationsparallel computing

0 likes · 14 min read

Understanding Supercomputers: Architecture, Performance, and Real‑World Applications

Architects' Tech Alliance

Apr 27, 2019 · Fundamentals

Why GPUs Outperform CPUs: Core Parameters and Architecture Explained

This article explains the fundamental differences between CPUs and GPUs, outlines key GPU specifications such as CUDA cores, memory capacity, bandwidth, and floating‑point precision, and reviews NVIDIA's major GPU series and architectural evolution for high‑performance and AI workloads.

CPUGPUNVIDIA

0 likes · 11 min read

Why GPUs Outperform CPUs: Core Parameters and Architecture Explained

Architects' Tech Alliance

Apr 18, 2019 · Fundamentals

What Powers Modern Graphics? A Deep Dive into GPU History and Architecture

This article traces the evolution of GPUs from early graphics chips to modern parallel processors, explains their internal pipeline, compares CPU and GPU architectures, and introduces key acceleration frameworks like CUDA and OpenCL for general‑purpose computing.

CUDAGPUGPU architecture

0 likes · 13 min read

What Powers Modern Graphics? A Deep Dive into GPU History and Architecture

MaGe Linux Operations

Nov 22, 2018 · Artificial Intelligence

Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques

This article explains how to speed up TensorFlow deep‑learning model training using single‑GPU acceleration, multi‑GPU parallelism, and distributed TensorFlow on Kubernetes, covering device placement, session parameters, synchronous vs asynchronous training modes, and practical code examples to improve performance and scalability.

GPU AccelerationTensorFlowdeep learning

0 likes · 10 min read

Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques

Meituan Technology Team

Oct 25, 2018 · Artificial Intelligence

Deep Learning System Design and Parallel Computing Solutions at Meituan

Meituan built a custom deep‑learning platform that combines data‑parallel and hybrid parallelism across multi‑GPU/cluster hardware, uses coarse‑grained scheduling and Kaldi‑derived acoustic algorithms, and supports fast NLU model hot‑updates, achieving near‑linear GPU scaling and 6–7× speedups over traditional solutions.

AI InfrastructureNLUacoustic modeling

0 likes · 13 min read

Deep Learning System Design and Parallel Computing Solutions at Meituan

Architects' Tech Alliance

Oct 9, 2018 · Fundamentals

Parallel Computing vs Distributed Computing: Concepts, Principles, and Differences

This article explains the concepts, principles, and key distinctions between parallel computing and distributed computing, describing their objectives, basic conditions, advantages, and typical use cases within high‑performance computing, and highlights how they differ from grid and cloud computing.

Distributed ComputingHPCcomputing fundamentals

0 likes · 6 min read

Meituan Technology Team

Aug 2, 2018 · Big Data

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

Meituan’s in‑store dining team demonstrates how R’s open‑source packages, powerful data manipulation, rich visualization libraries, and reproducible reporting can be engineered into scalable, parallelized workflows that turn secondary data processing into fast, interactive dashboards and analytics, proving R’s enterprise‑grade performance and adoption.

Big DataData VisualizationR

0 likes · 18 min read

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

dbaplus Community

May 23, 2018 · Big Data

Understanding MapReduce: A Simple Analogy to Master Big Data Distributed Computing

This article uses a human‑computer analogy and a playing‑card counting example to explain the fundamentals of distributed computing, why single machines cannot handle massive data, and how the MapReduce model’s four steps—split, transform, shuffle, and merge—solve big‑data problems.

Big DataDistributed ComputingMapReduce

0 likes · 15 min read

Understanding MapReduce: A Simple Analogy to Master Big Data Distributed Computing

Efficient Ops

Feb 26, 2018 · Fundamentals

Boost Your Python Speed: 20 Proven Tricks to Slash Execution Time

Learn how to dramatically improve Python performance by choosing optimal data structures, minimizing redundant data, using copy wisely, leveraging dict/set lookups, generators, efficient loops, string joining, proper formatting, fast variable swapping, concise comparisons, C extensions, multiprocessing, PyPy, and profiling tools, all backed by real benchmarks.

BenchmarkingC extensionsCode Profiling

0 likes · 16 min read

Boost Your Python Speed: 20 Proven Tricks to Slash Execution Time

Meituan Technology Team

Dec 1, 2017 · Big Data

Metric Logic Tree: Automated Anomaly Analysis for Business Metrics

The Metric Logic Tree automates business metric anomaly analysis by integrating heterogeneous data sources (Kylin, MySQL, Elasticsearch, Druid) with a three‑layer architecture—metric calculation, algorithmic analysis (waterfall and Gini‑coefficient methods), and a master‑worker computation service—that parallelizes queries, delivers immediate conclusions, and shortens decision cycles, as demonstrated in Meituan‑Dianping’s hotel‑travel operations.

Anomaly DetectionBig Dataalgorithm

0 likes · 7 min read

Metric Logic Tree: Automated Anomaly Analysis for Business Metrics

Alibaba Cloud Developer

Jul 13, 2017 · Artificial Intelligence

How STARK VRP Cuts Chinese Logistics Costs with AI‑Powered Routing

This article explains how Alibaba's Cainiao network built the STARK VRP engine—an AI‑driven, distributed vehicle‑routing solver that supports dozens of VRP variants, leverages metaheuristics, parallel island models, and deep reinforcement learning to dramatically reduce fleet size and travel distance in Chinese logistics.

AILogistics OptimizationMetaheuristics

0 likes · 8 min read

How STARK VRP Cuts Chinese Logistics Costs with AI‑Powered Routing

GF Securities FinTech

Sep 14, 2016 · Big Data

Scaling Real-Time Stock Market Data with Redis, Lua, and Go Goroutines

Exploring how a securities firm processes billions of daily stock‑market indicators in real time, this article compares an in‑process Redis + Lua solution with an out‑of‑process Goroutine‑based architecture, detailing data flow, performance trade‑offs, and scalability considerations for high‑frequency time‑series workloads.

LuaReal-time DataRedis

0 likes · 12 min read

Scaling Real-Time Stock Market Data with Redis, Lua, and Go Goroutines

21CTO

Apr 20, 2016 · Fundamentals

Why Algorithms Matter More Than Learning Every New Programming Language

The article argues that, despite the hype around ever‑changing programming languages, mastering core algorithms and computer science theory remains essential for building efficient, scalable solutions across fields—from search engines and parallel computing to scientific research—because algorithms are the enduring foundation of technology.

Data StructuresMapReducecomputer science fundamentals

0 likes · 11 min read

Why Algorithms Matter More Than Learning Every New Programming Language

Java High-Performance Architecture

Jan 24, 2016 · Big Data

MapReduce Explained: From Library Book Counting to Word Count in Big Data

This article introduces the MapReduce parallel processing model, illustrates its core map and reduce operations with a library‑shelf analogy and a classic word‑count example, and walks through each processing stage using clear diagrams to show how massive data is aggregated efficiently.

Big DataHadoopMapReduce

0 likes · 5 min read

MapReduce Explained: From Library Book Counting to Word Count in Big Data

21CTO

Dec 7, 2015 · Fundamentals

How D.E. Shaw’s Anton Supercomputer Revolutionized Computational Chemistry

Former professor and hedge‑fund founder D.E. Shaw leveraged his expertise in massive parallel computing to create the Anton supercomputer, a purpose‑built machine that accelerates molecular dynamics simulations thousands of times faster than traditional supercomputers, reshaping computational chemistry and high‑frequency trading.

AntonD.E. ShawHigh-performance computing

0 likes · 9 min read

How D.E. Shaw’s Anton Supercomputer Revolutionized Computational Chemistry