Tagged articles
77 articles
Page 1 of 1
IT Services Circle
IT Services Circle
May 12, 2026 · Fundamentals

Every Line of Code Echoes von Neumann’s 80‑Year‑Old Shortcut

The article explains how John von Neumann’s 1945 decision to store programs in memory created the universal von Neumann architecture, why this simple design outlasted more optimal alternatives, and how his ideas also spawned parallel computing and game theory, shaping modern computers, AI, and distributed systems.

Game Theorycomputer architectureparallel computing
0 likes · 7 min read
Every Line of Code Echoes von Neumann’s 80‑Year‑Old Shortcut
ITPUB
ITPUB
Apr 10, 2026 · Backend Development

How a Simple Refactor and Parallelism Cut Java Loop Time from 26s to 0.7s

A new team member transformed a painfully slow Java data‑processing routine—originally taking 26,856 ms—by refactoring nested loops, extracting repeated calculations, and introducing a thread‑pool for parallel execution, reducing runtime to just 748 ms, and the article walks through the before‑and‑after code and key techniques.

JavaPerformance Optimizationparallel computing
0 likes · 8 min read
How a Simple Refactor and Parallelism Cut Java Loop Time from 26s to 0.7s
Machine Heart
Machine Heart
Apr 1, 2026 · Artificial Intelligence

SSD Framework Doubles Inference Speed Over Top Engines, Breaking the Serial Bottleneck

The SSD framework and its SAGUARO optimization, developed by researchers from Stanford, Princeton, and Together AI, parallelize drafting and verification in speculative decoding, eliminating serial dependencies and achieving up to 2× faster inference than the world’s strongest engines and up to 5× speedup over standard autoregressive generation, while addressing challenges such as prediction accuracy, acceptance‑rate trade‑offs, and fallback strategies.

Inference AccelerationSAGUAROSSD
0 likes · 7 min read
SSD Framework Doubles Inference Speed Over Top Engines, Breaking the Serial Bottleneck
AI Cyberspace
AI Cyberspace
Nov 19, 2025 · Artificial Intelligence

Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs

This article explains how AI model training has evolved from single‑GPU workloads to massive distributed training using MPI for CPU‑centric communication and NCCL for GPU‑centric communication, covering their histories, core concepts, programming interfaces, topology discovery, protocol choices, and performance testing on multi‑GPU clusters.

AI distributed trainingGPU communicationHigh‑performance computing
0 likes · 71 min read
Why MPI and NCCL Are Critical for Scaling AI Models Across Thousands of GPUs
IT Services Circle
IT Services Circle
Nov 9, 2025 · Fundamentals

Why Nvidia’s GPUs Are the Secret Key to the Quantum Computing Era

Nvidia leverages its GPUs to solve quantum computers' fragile error‑correction problem, introducing ultra‑fast NVQLink interconnect and the CUDA‑Q programming platform, creating a feedback loop that secures its dominance in both traditional and emerging quantum markets.

CUDA-QGPUNVQLink
0 likes · 6 min read
Why Nvidia’s GPUs Are the Secret Key to the Quantum Computing Era
Data STUDIO
Data STUDIO
Nov 6, 2025 · Big Data

Ditch Multithreading: 11 Python Libraries That Deliver Lightning‑Fast Performance

This article reviews eleven high‑performance Python libraries—Polars, Numba, orjson, PyO3, Blosc, Awkward Array, Dask, Vaex, Modin, scikit‑learn‑intelex, uvloop and PyPy—showing how they achieve multi‑fold speedups through Rust, JIT, SIMD, lazy evaluation and parallel execution, and offers guidance on when to choose each tool.

PythonRustdask
0 likes · 14 min read
Ditch Multithreading: 11 Python Libraries That Deliver Lightning‑Fast Performance
Tencent Cloud Developer
Tencent Cloud Developer
Sep 26, 2025 · Fundamentals

Why GPUs Really Matter: From Architecture Basics to CUDA Programming

This article explains why GPUs have become the preferred platform for high‑performance computing, covering Dennard scaling, GPU speed advantages, theoretical FLOPS calculations, CUDA programming examples like SAXPY, the SIMT execution model, instruction pipelines, and modern techniques for handling branch divergence and register bank conflicts.

CUDA programmingGPU architectureGPU performance
0 likes · 38 min read
Why GPUs Really Matter: From Architecture Basics to CUDA Programming
Refining Core Development Skills
Refining Core Development Skills
Aug 7, 2025 · Fundamentals

Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture

This article explains how NVIDIA transitioned from gaming graphics cards to general‑purpose GPUs with the first data‑center Tesla GPU, detailing the unified shader architecture, the internal components of TPCs and SMs, CUDA 1.0 programming basics, and performance calculations that illustrate the massive computational advantage over contemporary CPUs.

CUDAGPGPUGPU architecture
0 likes · 23 min read
Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture
Code Mala Tang
Code Mala Tang
Jul 13, 2025 · Backend Development

Boost Python Computation Speed 30× with Rust and pyo3

This tutorial demonstrates how to accelerate Python's compute‑intensive tasks by rewriting critical functions in Rust, using the pyo3 library to create Python extension modules, and compares single‑thread, multithreaded, and multiprocessing performance on Linux.

Rustextension moduleparallel computing
0 likes · 10 min read
Boost Python Computation Speed 30× with Rust and pyo3
Tencent Technical Engineering
Tencent Technical Engineering
Jul 8, 2025 · Artificial Intelligence

Why GPUs Power Large‑Model Inference: From Graphics to GPGPU

This article explains how modern GPUs evolved from graphics rendering to general‑purpose computing, details the CPU‑GPU heterogenous architecture, walks through a CUDA demo that adds two billion‑element arrays, compares CPU and GPU performance, and describes the compilation, loading, and execution pipeline of CUDA kernels.

AI inferenceCUDAGPGPU
0 likes · 33 min read
Why GPUs Power Large‑Model Inference: From Graphics to GPGPU
Architects' Tech Alliance
Architects' Tech Alliance
Jun 19, 2025 · Fundamentals

Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained

This comprehensive guide covers 100 essential GPU fundamentals, from basic definitions and architecture to core technologies, performance optimization, emerging trends, and industry developments, providing a complete technical foundation for graphics, AI, and high‑performance computing applications.

Deep LearningGPUGraphics Processing Unit
0 likes · 19 min read
Unlock the Secrets of GPUs: 100 Essential Fundamentals Explained
Architects' Tech Alliance
Architects' Tech Alliance
Jun 15, 2025 · Fundamentals

Master GPU Fundamentals: Architecture, Performance, and Programming Insights

This comprehensive guide covers GPU definitions, evolution, core components, architectural designs, performance metrics, programming models, deep‑learning applications, comparisons with other processors, practical use cases, optimization techniques, and future trends, providing a solid foundation for anyone interested in modern graphics and compute acceleration.

Deep LearningGPUHardware
0 likes · 43 min read
Master GPU Fundamentals: Architecture, Performance, and Programming Insights
Open Source Linux
Open Source Linux
Apr 8, 2025 · Artificial Intelligence

A Turing‑Award Legend on AI, Parallel Computing, and Learning's Future

In this candid interview, 83‑year‑old Turing‑Award winner Jeffrey Ullman reflects on his decades‑long impact on compilers, databases, and algorithms, discusses the unpredictable nature of technological revolutions, explores the rise of large language models, parallel computing, prompt engineering, and the challenges of adapting education and software engineering to rapid AI‑driven change.

Education TechnologyPrompt engineeringartificial intelligence
0 likes · 23 min read
A Turing‑Award Legend on AI, Parallel Computing, and Learning's Future
Architects' Tech Alliance
Architects' Tech Alliance
Apr 3, 2025 · Artificial Intelligence

Why NVLink and NVSwitch Are Essential for Training Massive AI Models

Training today's massive AI foundation models demands extensive GPU resources and sophisticated multi‑GPU communication, making technologies like NVLink and NVSwitch crucial for efficient distributed training, while data‑parallel and model‑parallel strategies together optimize performance across large‑scale hardware clusters.

AIDistributed TrainingGPU
0 likes · 8 min read
Why NVLink and NVSwitch Are Essential for Training Massive AI Models
Tencent Technical Engineering
Tencent Technical Engineering
Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming
0 likes · 42 min read
Fundamentals of GPU Architecture and Programming
AI Product Manager Community
AI Product Manager Community
Feb 28, 2025 · Artificial Intelligence

What’s Inside DeepSeek’s Open‑Source Week? DualPipe, EPLB, 3FS and More Explained

DeepSeek’s recent Open‑Source Week unveiled a suite of AI‑focused tools—including the DualPipe pipeline parallelism algorithm, the EPLB expert load balancer, detailed training‑inference framework data, the high‑performance 3FS parallel file system, and the Smallpond data‑processing framework—each with GitHub links and performance highlights.

AIDistributed Trainingfile system
0 likes · 7 min read
What’s Inside DeepSeek’s Open‑Source Week? DualPipe, EPLB, 3FS and More Explained
AI Cyberspace
AI Cyberspace
Feb 5, 2025 · Fundamentals

From 2D Cards to AI Powerhouses: The Evolution of GPUs

This article traces the GPU's journey from early 2D graphics cards to modern GPGPUs powering AI and HPC, explains core hardware components, compares GPU and CPU architectures, and details the 3D rendering pipeline that underlies graphics and parallel computation.

GPUGraphics Processing UnitRendering Pipeline
0 likes · 10 min read
From 2D Cards to AI Powerhouses: The Evolution of GPUs
Infra Learning Club
Infra Learning Club
Jan 31, 2025 · Fundamentals

Essential CUDA Learning Guide: Basics, Compilation, and Profiling

This article walks through a practical APOD workflow for CUDA development—assessing bottlenecks, parallelizing with cuBLAS/cuFFT/Thrust, optimizing iteratively, and deploying—while covering nvcc compilation flags, PTX virtual ISA, nvprof profiling, core terminology (SP, SM, warp, grid, block, thread), indexing patterns, and unified memory references.

CUDACUDA terminologyGPU programming
0 likes · 8 min read
Essential CUDA Learning Guide: Basics, Compilation, and Profiling
Test Development Learning Exchange
Test Development Learning Exchange
Jan 22, 2025 · Artificial Intelligence

Comprehensive Guide to Python Data Science Libraries with Code Examples

This article presents a concise tutorial on essential Python data science libraries, covering data cleaning with Pandas, numerical analysis with NumPy and SciPy, visualization with Matplotlib and Seaborn, machine learning with scikit‑learn, NLP with NLTK and spaCy, time‑series modeling, image processing, database access, and parallel computing, each illustrated with ready‑to‑run code examples.

Data ScienceData visualizationNLP
0 likes · 7 min read
Comprehensive Guide to Python Data Science Libraries with Code Examples
Baidu Tech Salon
Baidu Tech Salon
Jan 8, 2025 · Artificial Intelligence

Evolution of Video Search Ranking Architecture Toward an End‑to‑End Large‑Model Framework

The paper describes transforming a tightly coupled, multi‑stage video search ranking pipeline into a modular, end‑to‑end large‑model architecture that decouples recall, employs a graph‑engine parallel framework and elastic compute allocation, thereby boosting performance, flexibility, personalization and lowering long‑term operational costs.

End-to-EndSystem optimizationelastic resources
0 likes · 10 min read
Evolution of Video Search Ranking Architecture Toward an End‑to‑End Large‑Model Framework
Baidu Geek Talk
Baidu Geek Talk
Jan 8, 2025 · Artificial Intelligence

Evolution of Video Search Ranking Architecture Towards an End‑to‑End Large‑Model Framework

The article outlines how video search ranking has shifted from a tightly‑coupled multi‑stage cascade to an extensible, end‑to‑end, model‑centric framework called Rankflow, leveraging large‑model inference, decoupled recall, fine‑grained parallelism, and elastic compute allocation to boost performance, flexibility, and maintainability while paving the way for future retrieval‑augmented generation integration.

AIelastic resourceslarge models
0 likes · 11 min read
Evolution of Video Search Ranking Architecture Towards an End‑to‑End Large‑Model Framework
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 7, 2024 · Artificial Intelligence

Demystifying FlashAttention: A Minimalist Derivation of the Algorithm

This article presents a concise, step‑by‑step derivation of FlashAttention, explaining the prerequisite linear‑algebra concepts, the softmax simplifications, and the parallel computation workflow—including the LSE‑enhanced version—so readers can grasp the algorithm’s elegance without heavy mathematics.

Algorithm DerivationAttention MechanismFlashAttention
0 likes · 8 min read
Demystifying FlashAttention: A Minimalist Derivation of the Algorithm
Tencent Cloud Developer
Tencent Cloud Developer
Nov 1, 2024 · Databases

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Tencent Cloud's TDSQL shattered world records in both TPC‑DS (OLAP) and TPC‑C (OLTP) benchmarks, achieving a 7260 M QphDS score at a cost of 37.52 CNY/kQphDS, and the article explains the three self‑developed technologies—MPP execution, parallel execution framework, and columnar‑vectorized engine—that made this performance possible.

Columnar StorageDatabase PerformanceMPP
0 likes · 7 min read
How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Oct 23, 2024 · Artificial Intelligence

How to Optimize Distributed Training for Massive AI Models: Strategies & Performance Insights

This article examines the challenges of scaling large AI models across multiple GPUs, explores data, pipeline, and tensor parallelism, analyzes collective communication patterns and data‑channel technologies such as PCIe, NVLink and RDMA, and offers concrete optimization recommendations to boost training efficiency.

Distributed TrainingGPU communicationcollective communication
0 likes · 21 min read
How to Optimize Distributed Training for Massive AI Models: Strategies & Performance Insights
DaTaobao Tech
DaTaobao Tech
Sep 6, 2024 · Backend Development

Go Language Coroutine Principles and GMP Model Implementation

The article examines Go's coroutine architecture and its GMP (Goroutine-Machine-Processor) model, tracing coroutine history, comparing kernel, user, and hybrid thread models, and detailing G, M, and P components, scheduling principles, work-stealing, preemption, and runtime implementation that give Go high concurrency and low latency.

CoroutinesGMP modelGo language
0 likes · 37 min read
Go Language Coroutine Principles and GMP Model Implementation
Architects' Tech Alliance
Architects' Tech Alliance
Sep 4, 2024 · Fundamentals

Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials

The article explains OpenAI's 2020 Scaling Laws that show larger transformer models, more data, and greater compute consistently improve performance, introduces the concept of emergent abilities at critical size thresholds, and outlines the core principles of parallel computing such as multi‑processor usage, task decomposition, concurrent execution, and inter‑processor communication.

communicationconcurrencyemergent abilities
0 likes · 6 min read
Why Bigger Transformers Win: Scaling Laws and Parallel Computing Essentials
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2024 · Artificial Intelligence

Why GPUs Are Essential for Modern Artificial Intelligence and How They Compare with CPUs, ASICs, and FPGAs

This article explains the pivotal role of GPUs in today’s generative AI era, describes their architecture and applications, compares them with CPUs, ASICs, and FPGAs, and offers guidance on selecting the right processor for AI workloads while also noting related reference resources.

Deep LearningGPUHardware
0 likes · 12 min read
Why GPUs Are Essential for Modern Artificial Intelligence and How They Compare with CPUs, ASICs, and FPGAs
Java Captain
Java Captain
Feb 26, 2024 · Fundamentals

Principles, Advantages, Challenges, and Future of Multithreaded Architecture

This article examines multithreaded architecture, explaining its core principles, highlighting efficiency and resource utilization benefits, discussing synchronization, communication, and management challenges along with mitigation strategies, and exploring its future impact on cloud computing, big data, AI, and overall system performance.

Software Architectureparallel computing
0 likes · 5 min read
Principles, Advantages, Challenges, and Future of Multithreaded Architecture
Test Development Learning Exchange
Test Development Learning Exchange
Dec 6, 2023 · Backend Development

Using Python multiprocessing and Celery for Parallel and Distributed Task Processing

This article introduces Python's multiprocessing module and the Celery task queue, explains their core concepts, and provides practical code examples for multi‑process parallel computation, inter‑process communication, asynchronous execution, scheduled jobs, result callbacks, retries, and distributed task orchestration.

Distributed TasksPythonTask Queue
0 likes · 7 min read
Using Python multiprocessing and Celery for Parallel and Distributed Task Processing
Test Development Learning Exchange
Test Development Learning Exchange
Aug 2, 2023 · Backend Development

Practical Examples of Python multiprocessing and Celery for Parallel and Distributed Task Processing

This article introduces Python's multiprocessing module and the Celery distributed task queue, explains their core features, and provides ten practical code examples demonstrating multi‑process parallel computation, inter‑process communication, asynchronous tasks, scheduling, retries, and distributed processing for real‑world applications.

Distributed TasksPythoncelery
0 likes · 7 min read
Practical Examples of Python multiprocessing and Celery for Parallel and Distributed Task Processing
Open Source Linux
Open Source Linux
May 29, 2023 · Fundamentals

What Is a GPU? Understanding Its Role in Graphics, AI, and Computing

This article explains what a GPU (Graphics Processing Unit) is, how it differs from a CPU, its architecture and performance characteristics, and why it powers everything from real‑time rendering to AI inference, using examples like the NVIDIA RTX 3090.

CPU comparisonGPUGraphics Processing Unit
0 likes · 8 min read
What Is a GPU? Understanding Its Role in Graphics, AI, and Computing
Baidu Geek Talk
Baidu Geek Talk
May 10, 2023 · Artificial Intelligence

Baidu's AI Infrastructure for Large-Scale LLM Training: Architecture, Challenges, and Optimization

Baidu’s AI infrastructure combines a massive InfiniBand‑linked GPU cluster, Kunlun chips, the PaddlePaddle framework, and the Wenxin model suite with 4D hybrid parallelism, elastic fault tolerance, and a two‑stage training pipeline to overcome computation, memory, and communication walls, delivering world‑leading MLPerf performance for large‑scale LLMs.

GPU clusterInfiniBandModel Training Optimization
0 likes · 15 min read
Baidu's AI Infrastructure for Large-Scale LLM Training: Architecture, Challenges, and Optimization
Architects' Tech Alliance
Architects' Tech Alliance
Apr 17, 2023 · Fundamentals

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

This article provides a comprehensive overview of high‑performance computing, covering system architectures, hardware components, performance metrics, network topologies, common parallel file systems, cluster management functions, mainstream job‑scheduling systems, and MPI‑based parallel programming models.

ClusterHPCHigh‑performance computing
0 likes · 14 min read
Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models
DataFunTalk
DataFunTalk
Feb 18, 2023 · Artificial Intelligence

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights

This article details Du Xiaoman's development of the ATLAS automated machine learning platform, covering business scenarios, AI algorithm deployment challenges, the end‑to‑end production workflow, platform components such as annotation, data, training and deployment, as well as optimization techniques like AutoML, meta‑learning, NAS, and large‑scale parallelism, concluding with lessons learned and future directions.

AI deploymentAutoMLMachine Learning Platform
0 likes · 20 min read
Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights
DeWu Technology
DeWu Technology
Jan 6, 2023 · Backend Development

Coding Standards and Best Practices for Backend Development

The guide defines backend coding standards and best practices, specifying clear naming conventions for classes, methods, variables and constants, enforcing consistent code style via .editorconfig, recommending unchecked exceptions for simpler error handling, promoting structured asynchronous logging with trace IDs, and outlining performance optimizations such as loop refinement, CompletableFuture concurrency, and proper resource management to prevent memory leaks.

Backend DevelopmentException Handlingbest practices
0 likes · 16 min read
Coding Standards and Best Practices for Backend Development
Tencent Cloud Developer
Tencent Cloud Developer
Sep 30, 2022 · Cloud Computing

Understanding GPU Computing and Cloud-Based GPU Solutions

The article explains how massive parallel pixel calculations demand GPUs, whose high cost and inflexibility are solved by Tencent Cloud’s elastic, virtualized GPU services—including vGPU, qGPU, TACO abstraction, and spot instances—delivering up to 16 EFLOPS for AI, scientific, graphics, and video workloads.

GPU computingTencent Cloudcloud GPU
0 likes · 5 min read
Understanding GPU Computing and Cloud-Based GPU Solutions
Model Perspective
Model Perspective
Aug 16, 2022 · Fundamentals

Boost Python Speed Instantly with Numba: A Practical Guide

Numba is a Python just‑in‑time compiler that transforms functions into fast native machine code, enabling near C‑level performance without rewriting code; by adding simple decorators like @jit or @njit, you can accelerate loops, NumPy operations, and even leverage parallel or GPU execution.

JIT CompilationPerformance OptimizationPython
0 likes · 7 min read
Boost Python Speed Instantly with Numba: A Practical Guide
Top Architect
Top Architect
Jun 18, 2022 · Big Data

Overview of Data Lakes and the Open SPL Compute Engine

This article explains the concept and challenges of data lakes, describes the “impossible triangle” of storage, compute, and cost, and introduces the open‑source SPL engine that provides multi‑source, file‑based, high‑performance computing to overcome those limitations.

Data LakeSPLcompute engine
0 likes · 13 min read
Overview of Data Lakes and the Open SPL Compute Engine
Architects' Tech Alliance
Architects' Tech Alliance
May 3, 2022 · Fundamentals

High‑Performance Computing Overview and Resource Guide

This article provides a comprehensive overview of high‑performance computing (HPC), covering its definition, hardware architectures, performance metrics, cluster components, parallel file systems, management and scheduling tools, as well as common MPI implementations and links to further technical resources.

ClusterFLOPSFile Systems
0 likes · 11 min read
High‑Performance Computing Overview and Resource Guide
21CTO
21CTO
Oct 25, 2021 · Fundamentals

Why Understanding Computer System Fundamentals Boosts Your Programming Performance

This article explains how computer systems combine hardware and system software, describes the memory hierarchy, operating‑system abstractions, Amdahl's law, and various forms of parallelism, and shows why mastering these fundamentals can dramatically improve program efficiency and reliability.

Amdahl's LawPerformance Optimizationcomputer architecture
0 likes · 16 min read
Why Understanding Computer System Fundamentals Boosts Your Programming Performance
Tencent Cloud Developer
Tencent Cloud Developer
Aug 17, 2021 · Backend Development

Design and Implementation of a Calculation DSL and Engine

The article presents a domain‑specific language that mimics Excel formulas, a stack‑based parser and recursive engine for evaluating calculations, and a multi‑layer architecture—including a dynamic priority scheduler—to efficiently resolve field dependencies, improve maintainability, and enable monitoring across large data systems.

Backend DevelopmentCalculation EngineDSL
0 likes · 11 min read
Design and Implementation of a Calculation DSL and Engine
Tech Musings
Tech Musings
Jul 8, 2021 · Big Data

Building a Simple Single-Node MapReduce System: From Theory to Code

This article walks through implementing a lightweight single‑machine MapReduce framework inspired by the original MapReduce paper, covering the abstract Map/Reduce model, task scheduling between master and workers, core Go code for map, reduce, worker, and coordinator, and a brief reflection on its limitations.

Big DataDistributed SystemsLab
0 likes · 10 min read
Building a Simple Single-Node MapReduce System: From Theory to Code
JavaEdge
JavaEdge
Feb 11, 2021 · Fundamentals

From Vacuum Tubes to Quantum Chips: How Computer Architecture Evolved

This article traces the historical evolution of computers from first‑generation vacuum‑tube machines to modern VLSI systems, explains fundamental performance limits such as the von Neumann bottleneck and power consumption, and introduces parallel optimization concepts like Amdahl’s Law and future computing paradigms.

Amdahl's LawCPU performanceVon Neumann Bottleneck
0 likes · 12 min read
From Vacuum Tubes to Quantum Chips: How Computer Architecture Evolved
Java Architect Essentials
Java Architect Essentials
Oct 18, 2020 · Backend Development

Performance Evaluation of Java Stream API: Serial vs Parallel Execution

This article presents a series of Java Stream API performance experiments—comparing serial and parallel streams on primitive, object, and reduction tasks—showing that while serial streams are slower than traditional loops for simple operations, parallel streams can significantly outperform both in multi‑core environments.

Backend DevelopmentStream APIparallel computing
0 likes · 7 min read
Performance Evaluation of Java Stream API: Serial vs Parallel Execution
Architects' Tech Alliance
Architects' Tech Alliance
Aug 19, 2020 · Fundamentals

Understanding Supercomputers: Definitions, Evaluation Systems, Research Value, and Technological Layers

This article explains what supercomputers are, outlines major ranking systems such as TOP500 and Green500, describes their wide‑range research and application value, and details the three‑layer architecture, parallel computing principles, and emerging trends like exascale, AI integration, quantum and bio‑computing.

ApplicationsExascaleSupercomputing
0 likes · 4 min read
Understanding Supercomputers: Definitions, Evaluation Systems, Research Value, and Technological Layers
TAL Education Technology
TAL Education Technology
May 14, 2020 · Artificial Intelligence

An Introduction to GPU Computing and CUDA Architecture

This article provides a concise overview of GPU computing fundamentals, covering GPU hardware components, memory hierarchy, parallel execution models, and the CUDA programming framework, illustrating how CPUs and GPUs cooperate in heterogeneous computing environments.

CUDACUDA programmingGPU
0 likes · 16 min read
An Introduction to GPU Computing and CUDA Architecture
Python Programming Learning Circle
Python Programming Learning Circle
Apr 16, 2020 · Big Data

Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations

This tutorial demonstrates how to initialize a SparkContext in PySpark, perform simple parallel computations such as temperature conversion and reduction, create a SparkSession to read CSV data, and apply common DataFrame operations like selecting columns, adding new columns, filtering, grouping, and aggregating.

Big DataPySparkSpark
0 likes · 5 min read
Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 16, 2020 · Artificial Intelligence

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

This article explains how the Mars framework enables parallel and distributed execution of core Python data‑science libraries—Numpy, Pandas, and Scikit‑Learn—while integrating with RAPIDS for GPU acceleration, and demonstrates its performance advantages through code examples and benchmark results.

GPU AccelerationMarsNumPy
0 likes · 16 min read
How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration
dbaplus Community
dbaplus Community
Oct 13, 2019 · Databases

Why PostgreSQL Stands Out: Community, Innovation, and Cloud Power

The talk explores PostgreSQL’s unique community-driven development, its commercial and innovative capabilities, new version features such as enhanced partitioning and parallel computing, and how its open‑source licensing, extensible architecture, and cloud integrations make it a compelling, enterprise‑ready database solution.

Database FeaturesOpen Source DatabasePostgreSQL
0 likes · 17 min read
Why PostgreSQL Stands Out: Community, Innovation, and Cloud Power
Architects' Tech Alliance
Architects' Tech Alliance
Sep 20, 2019 · Industry Insights

Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing

The article explains how heterogeneous parallel computing—distributing tasks across CPUs, GPUs, FPGAs and other accelerators—has become essential after Moore’s law plateau, detailing its principles, hardware and software perspectives, classification of architectures, processing stages, user‑guided versus compiler‑guided methods, and its relevance to AI, cloud and industry workloads.

CPUFPGAGPU
0 likes · 15 min read
Why Heterogeneous Parallel Computing Is the Future of High‑Performance Computing
Architects' Tech Alliance
Architects' Tech Alliance
Jul 1, 2019 · Fundamentals

Understanding Supercomputers: Architecture, Performance, and Real‑World Applications

The article explains the latest TOP500 supercomputer rankings, emphasizes that architecture—not just CPU count—is the core technology behind high‑performance computing, describes the challenges of networking, software, and power, and illustrates diverse applications such as nuclear simulation, climate forecasting, and video rendering.

HPC ApplicationsSupercomputingcomputer architecture
0 likes · 14 min read
Understanding Supercomputers: Architecture, Performance, and Real‑World Applications
MaGe Linux Operations
MaGe Linux Operations
Nov 22, 2018 · Artificial Intelligence

Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques

This article explains how to speed up TensorFlow deep‑learning model training using single‑GPU acceleration, multi‑GPU parallelism, and distributed TensorFlow on Kubernetes, covering device placement, session parameters, synchronous vs asynchronous training modes, and practical code examples to improve performance and scalability.

Deep LearningDistributed TrainingGPU Acceleration
0 likes · 10 min read
Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques
Meituan Technology Team
Meituan Technology Team
Oct 25, 2018 · Artificial Intelligence

Deep Learning System Design and Parallel Computing Solutions at Meituan

Meituan built a custom deep‑learning platform that combines data‑parallel and hybrid parallelism across multi‑GPU/cluster hardware, uses coarse‑grained scheduling and Kaldi‑derived acoustic algorithms, and supports fast NLU model hot‑updates, achieving near‑linear GPU scaling and 6–7× speedups over traditional solutions.

AI InfrastructureNLUSystem Architecture
0 likes · 13 min read
Deep Learning System Design and Parallel Computing Solutions at Meituan
Architects' Tech Alliance
Architects' Tech Alliance
Oct 9, 2018 · Fundamentals

Parallel Computing vs Distributed Computing: Concepts, Principles, and Differences

This article explains the concepts, principles, and key distinctions between parallel computing and distributed computing, describing their objectives, basic conditions, advantages, and typical use cases within high‑performance computing, and highlights how they differ from grid and cloud computing.

HPCcomputing fundamentalsdistributed computing
0 likes · 6 min read
Parallel Computing vs Distributed Computing: Concepts, Principles, and Differences
Meituan Technology Team
Meituan Technology Team
Aug 2, 2018 · Big Data

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

Meituan’s in‑store dining team demonstrates how R’s open‑source packages, powerful data manipulation, rich visualization libraries, and reproducible reporting can be engineered into scalable, parallelized workflows that turn secondary data processing into fast, interactive dashboards and analytics, proving R’s enterprise‑grade performance and adoption.

Big DataData visualizationR
0 likes · 18 min read
R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan
Efficient Ops
Efficient Ops
Feb 26, 2018 · Fundamentals

Boost Your Python Speed: 20 Proven Tricks to Slash Execution Time

Learn how to dramatically improve Python performance by choosing optimal data structures, minimizing redundant data, using copy wisely, leveraging dict/set lookups, generators, efficient loops, string joining, proper formatting, fast variable swapping, concise comparisons, C extensions, multiprocessing, PyPy, and profiling tools, all backed by real benchmarks.

BenchmarkingC extensionsCode Profiling
0 likes · 16 min read
Boost Your Python Speed: 20 Proven Tricks to Slash Execution Time
Meituan Technology Team
Meituan Technology Team
Dec 1, 2017 · Big Data

Metric Logic Tree: Automated Anomaly Analysis for Business Metrics

The Metric Logic Tree automates business metric anomaly analysis by integrating heterogeneous data sources (Kylin, MySQL, Elasticsearch, Druid) with a three‑layer architecture—metric calculation, algorithmic analysis (waterfall and Gini‑coefficient methods), and a master‑worker computation service—that parallelizes queries, delivers immediate conclusions, and shortens decision cycles, as demonstrated in Meituan‑Dianping’s hotel‑travel operations.

Big Dataalgorithmanomaly detection
0 likes · 7 min read
Metric Logic Tree: Automated Anomaly Analysis for Business Metrics
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 13, 2017 · Artificial Intelligence

How STARK VRP Cuts Chinese Logistics Costs with AI‑Powered Routing

This article explains how Alibaba's Cainiao network built the STARK VRP engine—an AI‑driven, distributed vehicle‑routing solver that supports dozens of VRP variants, leverages metaheuristics, parallel island models, and deep reinforcement learning to dramatically reduce fleet size and travel distance in Chinese logistics.

AILogistics OptimizationMetaheuristics
0 likes · 8 min read
How STARK VRP Cuts Chinese Logistics Costs with AI‑Powered Routing
GF Securities FinTech
GF Securities FinTech
Sep 14, 2016 · Big Data

Scaling Real-Time Stock Market Data with Redis, Lua, and Go Goroutines

Exploring how a securities firm processes billions of daily stock‑market indicators in real time, this article compares an in‑process Redis + Lua solution with an out‑of‑process Goroutine‑based architecture, detailing data flow, performance trade‑offs, and scalability considerations for high‑frequency time‑series workloads.

Luaparallel computingreal-time data
0 likes · 12 min read
Scaling Real-Time Stock Market Data with Redis, Lua, and Go Goroutines
21CTO
21CTO
Apr 20, 2016 · Fundamentals

Why Algorithms Matter More Than Learning Every New Programming Language

The article argues that, despite the hype around ever‑changing programming languages, mastering core algorithms and computer science theory remains essential for building efficient, scalable solutions across fields—from search engines and parallel computing to scientific research—because algorithms are the enduring foundation of technology.

Data StructuresMapReducecomputer science fundamentals
0 likes · 11 min read
Why Algorithms Matter More Than Learning Every New Programming Language
21CTO
21CTO
Dec 7, 2015 · Fundamentals

How D.E. Shaw’s Anton Supercomputer Revolutionized Computational Chemistry

Former professor and hedge‑fund founder D.E. Shaw leveraged his expertise in massive parallel computing to create the Anton supercomputer, a purpose‑built machine that accelerates molecular dynamics simulations thousands of times faster than traditional supercomputers, reshaping computational chemistry and high‑frequency trading.

AntonD.E. ShawHigh‑performance computing
0 likes · 9 min read
How D.E. Shaw’s Anton Supercomputer Revolutionized Computational Chemistry