Tagged articles
156 articles
Page 1 of 2
Coder Trainee
Coder Trainee
Feb 28, 2026 · Frontend Development

Automating Front‑End Deployment with Jenkins and Yarn

This guide walks through installing Node plugins in Jenkins, configuring a NodeJS tool, creating a freestyle project, discarding old builds, setting up Git source, defining the build environment, running Yarn commands to compile the front‑end, and deploying the artifacts via SSH with a custom script.

Front-end AutomationJenkinsYARN
0 likes · 4 min read
Automating Front‑End Deployment with Jenkins and Yarn
Raymond Ops
Raymond Ops
Jan 30, 2026 · Big Data

Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch

This guide walks you through designing and deploying a highly available HDFS architecture with dual NameNodes, ZooKeeper‑based failover, and a tuned YARN resource scheduler, covering detailed configuration files, failover testing, performance tuning, monitoring, automated health checks, capacity planning, and best‑practice checklists for production‑grade big‑data platforms.

Big DataHAHDFS
0 likes · 28 min read
Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch
MaGe Linux Operations
MaGe Linux Operations
Sep 8, 2025 · Big Data

Build Enterprise‑Grade HDFS HA and Optimize YARN Scheduling from Scratch

This comprehensive guide walks you through constructing a fault‑tolerant HDFS high‑availability architecture, configuring dual NameNodes with ZooKeeper and JournalNode clusters, fine‑tuning YARN resource schedulers, implementing monitoring, automated failover testing, and performance optimization, all backed by real‑world production experiences and code examples.

Big Data OperationsHDFSYARN
0 likes · 24 min read
Build Enterprise‑Grade HDFS HA and Optimize YARN Scheduling from Scratch
DataFunTalk
DataFunTalk
Jul 23, 2025 · Artificial Intelligence

Qwen3‑Coder: Open‑Source AI Programming Agent That Beats the Competition

Alibaba’s Tongyi team unveiled the open‑source Qwen3‑Coder, a massive 450‑billion‑parameter programming model that outperforms leading closed‑source solutions, supports up to 1 M token context, offers a free CLI tool, and demonstrates impressive code generation capabilities across animations, games, and real‑world tasks.

AI programmingQwen3-CoderReinforcement Learning
0 likes · 5 min read
Qwen3‑Coder: Open‑Source AI Programming Agent That Beats the Competition
Big Data Tech Team
Big Data Tech Team
Jun 8, 2025 · Big Data

Master Hadoop: A Step-by-Step Learning Roadmap for Big Data Professionals

This guide outlines a comprehensive Hadoop learning roadmap, covering essential prerequisites, core concepts such as HDFS, MapReduce, and YARN, hands‑on projects, advanced ecosystem tools like Hive, Pig, HBase and Spark, plus curated resources and community channels for aspiring big‑data engineers.

HDFSHadoopMapReduce
0 likes · 7 min read
Master Hadoop: A Step-by-Step Learning Roadmap for Big Data Professionals
iQIYI Technical Product Team
iQIYI Technical Product Team
May 15, 2025 · Big Data

Introducing AMD and ARM Bare‑Metal Instances for iQIYI Big Data Computing: Cloud Selection, Performance Evaluation, and Heterogeneous Scheduling

To reduce costs and boost compute density, iQIYI's big data team migrated from aging private‑cloud Intel servers to public‑cloud AMD and ARM bare‑metal instances, establishing a systematic machine‑selection process, performance testing framework, and YARN‑based heterogeneous scheduling to fully leverage the new hardware.

AMDARMYARN
0 likes · 16 min read
Introducing AMD and ARM Bare‑Metal Instances for iQIYI Big Data Computing: Cloud Selection, Performance Evaluation, and Heterogeneous Scheduling
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 21, 2025 · Frontend Development

Understanding pnpm: Solving Dependency Management Issues in Modern Frontend Development

This article explains the evolution of JavaScript package managers, the shortcomings of npm and Yarn such as duplicated installations, phantom dependencies and unpredictable dependency trees, and demonstrates how pnpm’s content‑addressable store, hard‑link and symlink strategy provides faster installs, reduced disk usage, and more reliable dependency isolation for frontend projects.

YARNdependency managementfrontend development
0 likes · 22 min read
Understanding pnpm: Solving Dependency Management Issues in Modern Frontend Development
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Feb 17, 2025 · Cloud Native

Optimizing Offline Pod Scheduling with Koordinator and Yarn-Operator

To reduce resource contention and improve offline task reliability, this article examines the challenges of using Koordinator with Hadoop Yarn pods on Kubernetes, proposes real‑time resource reporting and task‑level eviction strategies, details community and custom solutions, and outlines future enhancements with Volcano.

Big DataCloud NativeKoordinator
0 likes · 9 min read
Optimizing Offline Pod Scheduling with Koordinator and Yarn-Operator
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 14, 2025 · Backend Development

Understanding npm, Yarn, and pnpm: Dependency Management, Flat Dependencies, and pnpm's Store Mechanism

This article examines the evolution of JavaScript package managers—from npm's nested node_modules structure to Yarn's flat dependencies and finally pnpm's global store with hard‑ and soft‑link mechanisms—highlighting how each approach addresses path length, disk‑space waste, installation speed, and ghost‑dependency issues.

Hard LinkYARNdependency management
0 likes · 8 min read
Understanding npm, Yarn, and pnpm: Dependency Management, Flat Dependencies, and pnpm's Store Mechanism
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Dec 6, 2024 · Frontend Development

Corepack: The Next‑Generation Node.js Package Manager

The article reviews the evolution of JavaScript package managers, compares npm, Yarn, and pnpm, introduces Corepack as Node.js 16.9.0's experimental tool for consistent manager versions, explains its features and usage steps, and discusses remaining challenges such as version conflicts and limited advanced capabilities.

CorepackNode.jsYARN
0 likes · 8 min read
Corepack: The Next‑Generation Node.js Package Manager
360 Smart Cloud
360 Smart Cloud
Jul 9, 2024 · Big Data

Understanding Shuffle in Spark: From Native Shuffle to External and Remote Shuffle Services (Uniffle)

This article examines the critical role of shuffle in big‑data processing, compares Spark's native shuffle with the External Shuffle Service (ESS) and Remote Shuffle Service (RSS) solutions, introduces Uniffle's architecture and configuration, and shares practical deployment experiences and performance results within a 360 internal environment.

Big DataExternal Shuffle ServiceRemote Shuffle Service
0 likes · 15 min read
Understanding Shuffle in Spark: From Native Shuffle to External and Remote Shuffle Services (Uniffle)
Goodme Frontend Team
Goodme Frontend Team
May 6, 2024 · Frontend Development

npm vs Yarn vs pnpm: Which JavaScript Package Manager Wins in Speed and Space?

This article traces the evolution of JavaScript package managers—from early manual inclusion methods to npm, Yarn, and pnpm—detailing their architectures, performance characteristics, version‑locking mechanisms, and trade‑offs, helping developers choose the most suitable tool for modern frontend projects.

Node.jsYARNfrontend development
0 likes · 12 min read
npm vs Yarn vs pnpm: Which JavaScript Package Manager Wins in Speed and Space?
Efficient Ops
Efficient Ops
Apr 23, 2024 · Big Data

How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes

This guide walks through planning a three‑node Hadoop 3.3.5 cluster, explains default and custom configuration files, details core‑site, hdfs‑site, yarn‑site, and mapred‑site settings, shows how to distribute configs, start HDFS and YARN, and perform basic file‑system tests.

Big DataCluster SetupConfiguration
0 likes · 11 min read
How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes
Open Source Linux
Open Source Linux
Mar 11, 2024 · Big Data

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.

Big DataFlinkKubernetes
0 likes · 12 min read
Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes
Huolala Tech
Huolala Tech
Jan 4, 2024 · Big Data

How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs

This article details HuoLala's exploration of replacing x86 compute nodes with ARM servers in its big‑data platform, covering performance benchmarks, component adaptations for YARN, Tez/MR, security tools, a critical JDK de‑optimization issue, and the resulting production outcomes and future roadmap.

ARMBig DataJDK
0 likes · 14 min read
How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs
Alibaba Cloud Native
Alibaba Cloud Native
Nov 24, 2023 · Cloud Native

How Koordinator Boosts CPU Utilization and Cuts Costs in Large‑Scale Mixed Workloads

Koordinator, an open‑source cloud‑native mixed‑workload scheduler born from Alibaba’s internal container orchestration experience, enables Xiaohongshu to reclaim idle resources, improve CPU utilization beyond 45%, reduce resource costs by millions of core‑hours, and seamlessly integrate Kubernetes with YARN for batch and AI workloads.

Cloud NativeResource OptimizationYARN
0 likes · 18 min read
How Koordinator Boosts CPU Utilization and Cuts Costs in Large‑Scale Mixed Workloads
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 17, 2023 · Big Data

Mixed Workload Co-location of Big Data and Online Services at iQIYI: Design, Implementation, and Results

iQIYI’s mixed‑workload system colocates Spark/Hive big‑data jobs with online video services by running YARN NodeManagers inside Kubernetes, using an Elastic YARN Operator, Koordinator‑driven CPU oversubscription, and remote shuffle, boosting online CPU utilization from ~9 % to over 40 % and saving tens of millions of RMB annually.

Big DataCloud NativeKubernetes
0 likes · 19 min read
Mixed Workload Co-location of Big Data and Online Services at iQIYI: Design, Implementation, and Results
DevOps
DevOps
Jun 7, 2023 · Big Data

Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison

This article explains how Apache Spark can be deployed using the traditional Hadoop YARN resource manager and the newer Kubernetes approach, detailing configuration steps, submission methods, and a comprehensive comparison of isolation, scalability, learning curve, logging, performance, and cost considerations.

Big DataKubernetesSpark
0 likes · 10 min read
Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison
High Availability Architecture
High Availability Architecture
May 26, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling

This article introduces Amiya, a self‑developed overcommit component that dynamically increases Yarn memory and vCore capacity on Bilibili's offline big‑data clusters, details its architecture, key implementation of overcommit, eviction and mixed‑deployment strategies, and evaluates its resource‑utilization impact.

Cluster ManagementOvercommitResource Optimization
0 likes · 22 min read
Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling
Bilibili Tech
Bilibili Tech
May 23, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster

Amiya, a self‑developed dynamic over‑commit component for Bilibili’s offline big‑data cluster, inflates reported resources on under‑utilized nodes and adjusts them when load rises, adding roughly 683 TB of memory and 137 k vCores, boosting per‑node memory by 15 % and CPU usage by over 20 % while keeping eviction rates below 3 %.

AmiyaBilibiliCluster Management
0 likes · 22 min read
Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 6, 2023 · Backend Development

Monorepo Overview, Evolution, Pros & Cons, Pitfalls, and Tool Selection

This article explains what a monorepo is, traces its evolution from single‑repo monoliths to multi‑repo and back to a single repository with many modules, compares its advantages and disadvantages, lists common pitfalls, and evaluates major tooling options such as Turborepo, Rush, Nx, Lerna, Yarn and pnpm for different project sizes.

LernaMonorepoNx
0 likes · 21 min read
Monorepo Overview, Evolution, Pros & Cons, Pitfalls, and Tool Selection
政采云技术
政采云技术
Apr 18, 2023 · Big Data

Implementing Data Cost Governance: Quantifying Storage and Compute Expenses with Hive, Spark, and HDFS FsImage

This article explains how to perform task‑level data cost governance by collecting storage and compute metrics from Hive tables, Spark jobs, and HDFS FsImage files, then estimating monthly expenses using replication factors and resource‑usage rates, while providing practical SQL and shell examples.

Data Cost GovernanceHDFSSpark
0 likes · 18 min read
Implementing Data Cost Governance: Quantifying Storage and Compute Expenses with Hive, Spark, and HDFS FsImage
ByteFE
ByteFE
Mar 6, 2023 · Frontend Development

Deep Dive into npm, Yarn, and pnpm Dependency Management

This article explains how npm, Yarn, and pnpm manage JavaScript dependencies, detailing installation processes, flat vs nested node_modules structures, lock files, and the hard-link mechanism that improves speed and saves disk space.

YARNdependency managementnpm
0 likes · 16 min read
Deep Dive into npm, Yarn, and pnpm Dependency Management
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 24, 2023 · Big Data

Common Flink Task Submission Issues and Solutions on YARN

This article compiles frequent Flink job submission problems on YARN—including WordCount jar errors, HBase dependency conflicts, MySQL timeout, checkpoint restoration failures, parallelism limits, and unexpected container termination—provides root‑cause analysis and step‑by‑step remediation instructions.

Big DataCheckpointFlink
0 likes · 21 min read
Common Flink Task Submission Issues and Solutions on YARN
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 15, 2023 · Operations

How YARN and Kubernetes Solve Distributed Resource Management Challenges

This article explains how Apache YARN and Google Kubernetes address the three core problems of resource utilization, task responsiveness, and flexible scheduling in distributed environments, detailing their architectures, scheduling models, and practical implications for modern big‑data and cloud workloads.

KubernetesResource ManagementScheduling
0 likes · 8 min read
How YARN and Kubernetes Solve Distributed Resource Management Challenges
ByteFE
ByteFE
Nov 14, 2022 · Frontend Development

Evolution and Innovations of npm, Yarn, and pnpm Package Managers

This article examines the evolution of the three major JavaScript package managers—npm, Yarn, and pnpm—detailing their original designs, the problems they introduced such as nested node_modules, phantom dependencies and doppelgangers, and the innovative solutions like flattening, lock files, symbol/hard links, and PnP mode that each tool brought to improve dependency management.

YARNnode_modulesnpm
0 likes · 18 min read
Evolution and Innovations of npm, Yarn, and pnpm Package Managers
ITPUB
ITPUB
Oct 21, 2022 · Big Data

Hadoop Explained: Architecture, Core Components, and Real-World Applications

This article provides a comprehensive overview of Hadoop, covering its historical development, key characteristics, the HDFS storage framework, the MapReduce processing engine, YARN resource manager, and a wide range of real-world application scenarios, as well as the broader Hadoop ecosystem and its major components.

Big DataEcosystemHDFS
0 likes · 20 min read
Hadoop Explained: Architecture, Core Components, and Real-World Applications
DataFunSummit
DataFunSummit
Sep 25, 2022 · Big Data

Practical Optimizations and Resource Management of Hadoop YARN at Xiaomi

This article shares Xiaomi's internal practices of Hadoop YARN, covering scheduling and resource optimization, elastic scheduling, node overcommit handling, federation architecture, metadata warehouse construction, and future plans to improve cluster utilization and cost efficiency.

Big DataHadoopYARN
0 likes · 20 min read
Practical Optimizations and Resource Management of Hadoop YARN at Xiaomi
Bilibili Tech
Bilibili Tech
Jul 5, 2022 · Big Data

Multi‑Datacenter Architecture for Offline Big Data Processing at Bilibili

To overcome rapid data growth and on‑premise capacity limits, Bilibili adopted a scale‑out, unit‑based multi‑datacenter architecture that isolates failures, intelligently places jobs, replicates data via an enhanced DistCp service, routes reads with an IP‑aware HDFS router, and throttles cross‑site traffic, enabling stable offline big‑data processing of hundreds of petabytes while preserving throughput.

HDFSYARNbandwidth optimization
0 likes · 28 min read
Multi‑Datacenter Architecture for Offline Big Data Processing at Bilibili
DataFunSummit
DataFunSummit
Jul 1, 2022 · Big Data

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

Shilong Fei from Xiaomi Data Platform presents an in‑depth exploration of elastic scheduling for Hadoop YARN, covering background, design of resource pools, auto‑scaling architecture, challenges such as job stability and user transparency, achieved cost reductions, and future plans for further optimization.

Auto ScalingBig DataHadoop
0 likes · 20 min read
Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN
DataFunTalk
DataFunTalk
Jun 12, 2022 · Big Data

Huya Offline Job Scheduling System: Design, Baseline Scheduling, and Cost Optimization

This article introduces Huya's offline job scheduling platform, covering its positioning, evolution, system architecture, baseline scheduling techniques, cost‑optimization strategies, resource‑balancing methods, and future intelligent data‑warehouse directions, illustrating how data‑driven automation improves YARN utilization and SLA compliance.

Cost OptimizationDAGYARN
0 likes · 12 min read
Huya Offline Job Scheduling System: Design, Baseline Scheduling, and Cost Optimization
DataFunTalk
DataFunTalk
May 21, 2022 · Big Data

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

This talk presents Xiaomi's design and deployment of an elastic scheduling system for Hadoop YARN, covering background analysis, resource‑pool strategy, auto‑scaling architecture, stability challenges, label‑based resource isolation, Spark shuffle handling, cost‑saving results and future plans.

Big DataHadoopResource Management
0 likes · 16 min read
Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN
DataFunSummit
DataFunSummit
May 4, 2022 · Big Data

NetEase Big Data Platform: HDFS Optimization and Practices

NetEase’s senior big‑data engineer shares how the company’s large‑scale data platform leverages Hadoop, HDFS, YARN and related technologies, detailing multi‑layer architecture, cross‑cloud deployment, storage optimizations, NameNode performance enhancements, RPC prioritization, and practical lessons from operating petabyte‑scale clusters.

Cluster OptimizationHDFSStorage Management
0 likes · 23 min read
NetEase Big Data Platform: HDFS Optimization and Practices
DataFunTalk
DataFunTalk
Mar 30, 2022 · Big Data

NetEase Big Data Platform: HDFS Optimization and Practice

This article presents NetEase's big data platform architecture, detailing multi‑layer storage and compute design, HDFS deployment challenges, NameNode and NameSpace performance optimizations, cluster scaling strategies, data tiering, hardware upgrades, and real‑world business use cases, illustrating practical large‑scale big data engineering.

Big DataCluster OptimizationData Management
0 likes · 23 min read
NetEase Big Data Platform: HDFS Optimization and Practice
Bilibili Tech
Bilibili Tech
Mar 25, 2022 · Big Data

Bilibili's YARN Scheduling Optimization Practice: From Heartbeat-Driven to Global Scheduling

Bilibili transformed its YARN CapacityScheduler from a heartbeat‑driven design to a multi‑threaded global scheduler by separating lock handling, adopting Weighted Round‑Robin with DRF, adding batch node selection, fixing proposal inconsistencies, tuning GC and logging, and thereby reduced application allocation time by about 38 % on clusters of up to 8,000 nodes.

Big DataCapacitySchedulerHadoop
0 likes · 15 min read
Bilibili's YARN Scheduling Optimization Practice: From Heartbeat-Driven to Global Scheduling
DaTaobao Tech
DaTaobao Tech
Mar 23, 2022 · Frontend Development

Why npm, Yarn, pnpm and Deno Manage Dependencies Differently – A Deep Dive

This article analyses the evolution of front‑end package managers—from npm's early nested modules to Yarn's lockfile and Plug'n'Play, pnpm's hard‑link strategy, cnpm/tnpm adaptations, and Deno's URL‑based imports—highlighting their dependency resolution mechanisms, trade‑offs, and remaining challenges.

DenoYARNdependency management
0 likes · 19 min read
Why npm, Yarn, pnpm and Deno Manage Dependencies Differently – A Deep Dive
Tencent Cloud Developer
Tencent Cloud Developer
Feb 17, 2022 · Frontend Development

Exploring Monorepo Strategies and Practices for Front‑end Development

The article explains how adopting a monorepo—housing multiple independent front‑end packages in a single Git repository—simplifies code sharing, tooling, and documentation for Vue 3 component collections, compares it with monolith and multi‑repo approaches, outlines essential tools such as pnpm, Changesets, Turborepo, ESLint, and Vitepress, and provides step‑by‑step setup guidance, concluding that monorepos are effective for moderately sized front‑end projects despite potential scaling and permission challenges.

LernaMonorepoYARN
0 likes · 27 min read
Exploring Monorepo Strategies and Practices for Front‑end Development
Dada Group Technology
Dada Group Technology
Jan 14, 2022 · Frontend Development

Optimizing Build and Dependency Installation for Dada's Large-Scale Frontend System

This article analyzes the slow build process of Dada's massive frontend platform, identifies bottlenecks in dependency installation and webpack compilation, and presents practical optimizations such as node_modules caching, cp command adjustments, Babel loader caching, and other webpack tweaks that reduced average build time from 600 seconds to around 100 seconds.

Build OptimizationYARNbabel
0 likes · 8 min read
Optimizing Build and Dependency Installation for Dada's Large-Scale Frontend System
TAL Education Technology
TAL Education Technology
Jan 13, 2022 · Cloud Native

Offline Mixed Deployment with Kubernetes: Architecture, Implementation, and Performance Evaluation for Big Data Workloads

This article describes a cloud‑native offline mixed‑deployment solution that leverages Kubernetes to share resources between big‑data clusters and business services, outlines its implementation steps, presents detailed performance comparisons between Yarn and Kubernetes using TPC‑DS, Spark, and Terasort workloads, and discusses production experience and future plans.

Big DataCloud NativeKubernetes
0 likes · 8 min read
Offline Mixed Deployment with Kubernetes: Architecture, Implementation, and Performance Evaluation for Big Data Workloads
ELab Team
ELab Team
Dec 31, 2021 · Fundamentals

Mastering Inodes, Hard & Soft Links: From Linux to Frontend Tooling

This article explains the fundamentals of inodes, sectors, and blocks, demonstrates how to retrieve file information with Node.js and Linux commands, compares hard and soft links, and shows practical applications of these links in frontend workflows such as yarn link and pnpm installation.

FilesystemFrontend toolingHard Link
0 likes · 14 min read
Mastering Inodes, Hard & Soft Links: From Linux to Frontend Tooling
21CTO
21CTO
Oct 14, 2021 · Big Data

How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays

LinkedIn’s engineers detail how they repeatedly doubled their Hadoop cluster to over 11,000 nodes, tackled YARN scheduling delays caused by workload imbalances, and created the DynoYARN simulation tool to predict performance impacts of massive scaling.

Big DataDynoYARNHadoop
0 likes · 7 min read
How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays
Big Data Technology Architecture
Big Data Technology Architecture
Sep 28, 2021 · Big Data

Integrating Apache Kyuubi with CDH 6 and Spark 3: Deployment, Configuration, and Performance Tuning

This guide explains how to deploy Apache Kyuubi on a CDH 6 cluster, replace HiveServer2 with Kyuubi, integrate Spark 3, apply necessary patches, configure environment and Spark settings, and optimize engine sharing for various workloads, providing complete code snippets and step‑by‑step instructions.

CDHHiveServer2Kyuubi
0 likes · 19 min read
Integrating Apache Kyuubi with CDH 6 and Spark 3: Deployment, Configuration, and Performance Tuning
Java Architect Essentials
Java Architect Essentials
Sep 21, 2021 · Big Data

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

The interview with Kuaishou senior architect Zhao Jianbo details the three‑phase evolution of its trillion‑scale big data platform, covering foundational Hadoop services, real‑time and OLAP extensions, deep customizations, Spring Festival Gala challenges, scheduling innovations, Hadoop usage, and the relationship between big data and cloud architectures.

Big DataFlinkHadoop
0 likes · 19 min read
Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 17, 2021 · Big Data

Key Reliability Mechanisms of HDFS, YARN Failover Strategies, and Hadoop Shuffle Process

This article explains HDFS reliability features such as replica policies, rack awareness, heartbeat, safe mode, checksums, trash, metadata protection and snapshots, then details YARN failover handling for ApplicationMaster, NodeManager and ResourceManager, and finally describes the Hadoop MapReduce shuffle workflow and tuning tips.

HDFSMapReduceReliability
0 likes · 13 min read
Key Reliability Mechanisms of HDFS, YARN Failover Strategies, and Hadoop Shuffle Process
ByteFE
ByteFE
Aug 16, 2021 · Backend Development

Understanding yarn.lock: Why It Changes and How to Manage It

This article explains the purpose and structure of yarn.lock, why it may show unexpected diffs after dependency updates, and provides practical strategies—including using resolutions, frozen lockfiles, and preventive workflows—to keep package.json and yarn.lock in sync and avoid build issues.

YARNdependency-managementlockfile
0 likes · 12 min read
Understanding yarn.lock: Why It Changes and How to Manage It
The Dominant Programmer
The Dominant Programmer
Aug 2, 2021 · Big Data

How to Build a Beginner Hadoop Cluster on CentOS 7

This article introduces Apache Hadoop’s open‑source framework, explains its core components such as HDFS, MapReduce, ZooKeeper, HBase, Hive, Pig, Mahout, Sqoop, Flume, Chukwa, Oozi​e, Ambari and YARN, and outlines the steps to set up a beginner‑level Hadoop cluster on CentOS 7.

Big DataCentOS 7HBase
0 likes · 11 min read
How to Build a Beginner Hadoop Cluster on CentOS 7
ELab Team
ELab Team
Jun 10, 2021 · Fundamentals

Why Your Monorepo Is Slowing Down and How pnpm & Rush Can Fix It

This article examines the scalability and reliability problems of a Yarn‑workspace based monorepo—such as command inconsistency, slow publishing, phantom dependencies, duplicate packages, and lockfile conflicts—and presents pnpm and Rush as comprehensive solutions with practical guidelines for package referencing and workspace protocols.

MonorepoYARNdependency-issues
0 likes · 23 min read
Why Your Monorepo Is Slowing Down and How pnpm & Rush Can Fix It
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 4, 2021 · Big Data

Comprehensive Spark Interview Questions and Answers

This article provides a detailed collection of Spark interview questions covering deployment modes, performance advantages over MapReduce, shuffle mechanisms, RDD characteristics, optimization techniques, resource management, and various practical aspects of Spark on YARN, Mesos, and Kubernetes.

RDDShuffleSpark
0 likes · 21 min read
Comprehensive Spark Interview Questions and Answers
58 Tech
58 Tech
May 28, 2021 · Big Data

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

Big DataCluster UpgradeHDFS
0 likes · 19 min read
Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3
DataFunTalk
DataFunTalk
May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIBig DataFlink
0 likes · 27 min read
Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili
dbaplus Community
dbaplus Community
Mar 16, 2021 · Big Data

How Kuaishou Scales YARN to Tens of Thousands of Nodes with the Kwai Scheduler

This article explains how Kuaishou’s massive offline compute clusters—tens of thousands of machines processing hundreds of petabytes daily—are managed by a heavily customized YARN stack and the home‑grown Kwai Scheduler, detailing architecture, scheduler evolution, multi‑scenario optimizations, and future scaling plans.

Big DataCluster OptimizationKwai Scheduler
0 likes · 14 min read
How Kuaishou Scales YARN to Tens of Thousands of Nodes with the Kwai Scheduler
Taobao Frontend Technology
Taobao Frontend Technology
Mar 11, 2021 · Frontend Development

Mastering Monorepo: Boost Code Reuse and Collaboration in JavaScript Projects

This article explains the monorepo strategy, its advantages and drawbacks, and provides step‑by‑step guidance on setting up a project‑level monorepo using tools like Volta, Yarn workspaces, Lerna, scripty, and commitlint, helping developers streamline code reuse, dependency management, and version synchronization across multiple JavaScript packages.

LernaMonorepoWorkspaces
0 likes · 27 min read
Mastering Monorepo: Boost Code Reuse and Collaboration in JavaScript Projects
DataFunTalk
DataFunTalk
Mar 3, 2021 · Big Data

Kwai Scheduler: Scaling YARN for Ultra‑Large Clusters at Kuaishou

This article presents Kuaishou's large‑scale offline computing challenges and describes how the team customized YARN and built the Kwai scheduler to achieve multi‑threaded, pluggable resource scheduling for clusters of tens of thousands of nodes, supporting diverse workloads such as ETL, ad‑hoc queries, machine‑learning training, and real‑time Flink jobs.

Cluster OptimizationKwai SchedulerYARN
0 likes · 15 min read
Kwai Scheduler: Scaling YARN for Ultra‑Large Clusters at Kuaishou
ELab Team
ELab Team
Feb 9, 2021 · Frontend Development

Why Yarn Beats npm: Deep Dive into Its Architecture and Workflow

This article explores Yarn’s architecture and workflow, comparing it with npm, cnpm, and pnpm, detailing multi‑threaded installation, caching, dependency resolution, lockfile handling, and step‑by‑step processes from package fetching to linking, optimization, and common Q&A, illustrated with code snippets.

YARNdependency resolutionnpm
0 likes · 22 min read
Why Yarn Beats npm: Deep Dive into Its Architecture and Workflow
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 22, 2021 · Big Data

Key New Features and Improvements in Hadoop 3.x

Hadoop 3.x upgrades the platform to JDK 1.8 and introduces a range of enhancements across common components, HDFS, YARN, and MapReduce, including erasure coding, multi‑NameNode high availability, cgroup‑based resource isolation, native map‑output collectors, and split client libraries, while also adding support for Azure and Aliyun distributed file systems.

HDFSHadoopMapReduce
0 likes · 7 min read
Key New Features and Improvements in Hadoop 3.x
Practical DevOps Architecture
Practical DevOps Architecture
Nov 27, 2020 · Big Data

Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster

This tutorial provides a complete walkthrough for downloading Hadoop 2.8.2, setting up a three‑node master‑slave cluster, configuring core, HDFS, MapReduce and YARN settings, creating required directories, distributing the installation, starting the services, verifying the cluster status, and finally shutting it down.

Big DataCluster SetupHDFS
0 likes · 5 min read
Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster
Tencent Cloud Developer
Tencent Cloud Developer
Nov 13, 2020 · Big Data

Apache Spark Core: Architecture, Components, and Execution Flow

Apache Spark Core is a high‑performance, fault‑tolerant engine that abstracts distributed computation through SparkContext, DAG and Task schedulers, supports in‑memory and disk storage, runs on various cluster managers (YARN, Kubernetes, etc.), and unifies batch, streaming, ML and graph processing via its rich ecosystem.

Apache SparkBig DataDAG scheduler
0 likes · 17 min read
Apache Spark Core: Architecture, Components, and Execution Flow
DataFunTalk
DataFunTalk
Jul 5, 2020 · Big Data

ByteDance’s Optimizations to Hadoop YARN: Enhancing Utilization, Multi‑Load Scenarios, Stability, and Multi‑Region Active‑Active

This article describes ByteDance’s four‑year series of customizations to Hadoop YARN—covering utilization improvements, multi‑load scenario optimizations, stability enhancements, and multi‑region active‑active deployment—along with practical production experiences, architectural details, and future work directions.

ByteDanceCluster OptimizationHadoop
0 likes · 12 min read
ByteDance’s Optimizations to Hadoop YARN: Enhancing Utilization, Multi‑Load Scenarios, Stability, and Multi‑Region Active‑Active
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 18, 2020 · Big Data

CPU Resource Isolation in YARN with Linux cgroups

This article introduces Linux cgroups, explains their CPU subsystem files and parameters, demonstrates how to create and configure cgroups, and details how YARN leverages cgroups for CPU resource isolation through configuration settings and code implementations, comparing soft and hard limit approaches.

HadoopLinuxYARN
0 likes · 10 min read
CPU Resource Isolation in YARN with Linux cgroups
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2020 · Big Data

58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration

This article presents a comprehensive case study of 58.com’s massive Hadoop‑based offline computing platform, detailing its architecture, scaling challenges, performance‑tuning measures, YARN and SparkSQL upgrades, and the systematic cross‑data‑center migration of thousands of nodes and petabytes of data.

Big DataData MigrationHadoop
0 likes · 23 min read
58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration
Big Data Technology Architecture
Big Data Technology Architecture
May 15, 2020 · Big Data

Performance Tuning of Hive on Spark in YARN Mode

This article explains how to optimize Hive on Spark running on YARN, covering YARN node resource configuration, Spark executor and driver memory settings, dynamic allocation, parallelism, and key Hive parameters to achieve superior performance compared to Hive on MapReduce.

Cluster ConfigurationSparkYARN
0 likes · 11 min read
Performance Tuning of Hive on Spark in YARN Mode
Big Data Technology & Architecture
Big Data Technology & Architecture
May 6, 2020 · Big Data

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

This article provides a comprehensive, hands‑on tutorial for preparing three VMs, installing JDK and Hadoop, configuring core‑site.xml, hdfs‑site.xml, mapred‑site.xml, yarn‑site.xml, setting environment variables, distributing the package, starting HDFS and YARN, and verifying the cluster via web UI and jps commands.

Big DataCluster SetupHDFS
0 likes · 14 min read
Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines
dbaplus Community
dbaplus Community
Apr 15, 2020 · Big Data

How Ctrip Scaled Hadoop Across Data Centers: Architecture and Lessons

This article details Ctrip's Hadoop evolution, the challenges of expanding across multiple data centers, the evaluation of multi‑cluster versus single‑cluster designs, and the concrete architectural changes, migration tools, bandwidth monitoring, and future plans that enabled a stable cross‑datacenter big‑data platform.

Big DataCross-DataCenterHDFS
0 likes · 19 min read
How Ctrip Scaled Hadoop Across Data Centers: Architecture and Lessons
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 8, 2020 · Big Data

Common Apache Flink Exceptions and How to Resolve Them

This article enumerates typical Apache Flink deployment, job, and checkpoint errors—such as JDK version issues, resource shortages, task manager timeouts, and state migration problems—and provides practical troubleshooting steps and configuration tips to help engineers quickly diagnose and fix these failures.

Big DataCheckpointException
0 likes · 8 min read
Common Apache Flink Exceptions and How to Resolve Them
Open Source Linux
Open Source Linux
Mar 12, 2020 · Big Data

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

This tutorial walks you through setting up a three‑node Hadoop 2.9.2 cluster on CentOS 7.5, covering environment preparation, password‑less SSH, user creation, JDK installation, Hadoop extraction, configuration file edits, directory setup, ownership changes, service startup, and verification via web UIs.

Big DataCentOSCluster Setup
0 likes · 13 min read
Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5
Ctrip Technology
Ctrip Technology
Feb 27, 2020 · Big Data

Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned

This article details Ctrip's cross‑datacenter Hadoop architecture, covering the evolution of its Hadoop platform, the challenges of multi‑site bandwidth and latency, design choices between multi‑cluster and single‑cluster solutions, and the concrete HDFS, YARN, balancer, migration, monitoring, and throttling implementations that enable transparent, consistent, and efficient multi‑datacenter operations.

Cross-DataCenterData MigrationHDFS
0 likes · 15 min read
Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned