Tagged articles

YARN

158 articles · Page 2 of 2

Mar 8, 2020 · Big Data

Hive on Spark Tuning Parameters and Best Practices

This article explains how to tune Hive on Spark by adjusting driver, executor, and Hive configuration parameters—including CPU cores, memory allocations, dynamic allocation, and join thresholds—to achieve optimal performance when running on YARN.

Big DataHivePerformance Tuning

0 likes · 7 min read

Hive on Spark Tuning Parameters and Best Practices

Ctrip Technology

Feb 27, 2020 · Big Data

Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned

This article details Ctrip's cross‑datacenter Hadoop architecture, covering the evolution of its Hadoop platform, the challenges of multi‑site bandwidth and latency, design choices between multi‑cluster and single‑cluster solutions, and the concrete HDFS, YARN, balancer, migration, monitoring, and throttling implementations that enable transparent, consistent, and efficient multi‑datacenter operations.

Cross-DataCenterData MigrationHDFS

0 likes · 15 min read

Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned

Big Data Technology & Architecture

Feb 5, 2020 · Big Data

Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication

The article describes how to troubleshoot and fix Oozie shell‑action failures when submitting Flink jobs on a CDH 6.3 cluster with Kerberos, detailing environment‑variable conflicts, error messages, and the final solution using a clean environment and custom FLINK_CONF_DIR settings.

Big DataCDHFlink

0 likes · 7 min read

Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication

Big Data Technology & Architecture

Dec 22, 2019 · Big Data

Dynamic Resource Allocation in Spark Streaming: Problems, Mechanisms, and Practical Guidelines

The article explains Spark's default static resource allocation, analyzes the limitations of its Dynamic Resource Allocation (DRA) for streaming workloads, describes the internal Spark components and code paths involved, and proposes concrete design and configuration recommendations for implementing more responsive executor scaling.

Big DataDynamic Resource AllocationExecutor Management

0 likes · 11 min read

Dynamic Resource Allocation in Spark Streaming: Problems, Mechanisms, and Practical Guidelines

Big Data Technology & Architecture

Dec 20, 2019 · Big Data

Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler

This article explains the role of YARN's Scheduler, compares FIFO, Capacity, and Fair schedulers, details their configurations—including XML snippets for Capacity and Fair schedulers, queue hierarchy, preemption settings, and provides practical guidance for resource allocation in Hadoop clusters.

Big DataCapacity SchedulerFair Scheduler

0 likes · 13 min read

Understanding Hadoop YARN Schedulers: FIFO, Capacity, and Fair Scheduler

DataFunTalk

Nov 21, 2019 · Big Data

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

The article details the technical evolution of 58.com’s real-time computing platform—from Storm and Spark Streaming to a Flink‑based one‑stop solution called Wstream—covering use cases, architecture, stability measures, migration from Storm, operational diagnostics, and future development plans.

Big DataFlinkReal-time Streaming

0 likes · 11 min read

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

Big Data Technology & Architecture

Sep 19, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article presents a comprehensive analysis of Meituan's Hadoop YARN fair scheduler, detailing its architecture, resource abstractions, scheduling workflow, performance bottlenecks, fine‑grained metrics, and a series of optimization techniques—including sorting improvements, job‑skip reduction, parallel queue sorting, and robust rollout strategies—to achieve high‑throughput, low‑latency scheduling for large‑scale offline, streaming, and machine‑learning workloads.

Big DataFair SchedulerPerformance Optimization

0 likes · 24 min read

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Tencent Cloud Developer

Sep 11, 2019 · Big Data

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Big DataCluster OptimizationHadoop

0 likes · 16 min read

YARN Practice and Technical Evolution at Kuaishou

Tencent Cloud Developer

Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataCloud ComputingData Warehouse

0 likes · 22 min read

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

Qunar Tech Salon

Aug 22, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article details Meituan's experience optimizing the Hadoop YARN fair scheduler, covering background challenges, architectural components, resource abstractions, scheduling flow, performance metrics, a series of code‑level optimizations, stability strategies for production rollout, and future directions for large‑scale cluster scheduling.

Big DataFair SchedulerLoad Simulation

0 likes · 23 min read

58 Tech

Aug 1, 2019 · Big Data

Optimizing Flink‑Storm for Large‑Scale Storm Task Migration on the 58 Real‑Time Computing Platform

This article describes how the 58 real‑time computing platform optimized the Flink‑Storm beta tool and implemented large‑scale, smooth migration of Storm jobs to Flink, covering background, architecture, platform‑level enhancements, YARN runtime support, deployment, and user‑side integration.

FlinkStormYARN

0 likes · 9 min read

Optimizing Flink‑Storm for Large‑Scale Storm Task Migration on the 58 Real‑Time Computing Platform

Meituan Technology Team

Aug 1, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Meituan improved its custom Hadoop YARN Fair Scheduler by pre‑computing resource usage, filtering zero‑demand jobs, and parallelizing queue sorting, which reduced sorting time from 30 s to 5 s per minute, boosted container‑per‑second throughput to 50 k, enabled live roll‑backs, and prepared the system for clusters up to 10 k nodes and future scaling to hundreds of thousands.

Big DataFair SchedulerHadoop

0 likes · 24 min read

21CTO

Jun 28, 2019 · Big Data

Master Hadoop High Availability: A Complete Step‑by‑Step HA HDFS & YARN Guide

This article provides a comprehensive, language‑agnostic tutorial on building a highly available Hadoop cluster, covering HDFS and YARN HA architectures, QJM shared storage, required components, configuration files, installation commands, startup procedures, verification steps, and troubleshooting references.

Cluster SetupHDFSHadoop

0 likes · 20 min read

Master Hadoop High Availability: A Complete Step‑by‑Step HA HDFS & YARN Guide

Architecture Digest

Jun 26, 2019 · Big Data

Guide to Setting Up Hadoop High Availability (HA) Cluster with HDFS and YARN

This article provides a step‑by‑step tutorial on configuring Hadoop high availability, covering HDFS HA architecture, Quorum Journal Manager synchronization, NameNode failover, YARN HA, required pre‑conditions, cluster planning, configuration files, service startup, and verification procedures.

Big DataCluster SetupHDFS

0 likes · 16 min read

Guide to Setting Up Hadoop High Availability (HA) Cluster with HDFS and YARN

DataFunTalk

Jun 17, 2019 · Big Data

Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era

This article explores Hadoop’s role in the big‑data era, detailing its architecture, core components such as HDFS, YARN, MapReduce, Ozone and Submarine, the challenges of trillion‑scale data, and why its scalability, cost efficiency, and a mature ecosystem give it a competitive edge.

Data LakeHadoopMapReduce

0 likes · 11 min read

Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era

Big Data Technology & Architecture

Apr 20, 2019 · Big Data

Weekly Hadoop Knowledge Points: Compression Formats, MapReduce Join, Hive Setup, and YARN Capacity Scheduler

This weekly bulletin summarizes four Hadoop knowledge points—compression formats, MapReduce join techniques, Hive installation, and YARN Capacity Scheduler—while also sharing personal updates about a PhD graduation, the upcoming May Day holiday, and a request for likes and shares.

Big DataHadoopHive

0 likes · 2 min read

Weekly Hadoop Knowledge Points: Compression Formats, MapReduce Join, Hive Setup, and YARN Capacity Scheduler

Big Data Technology & Architecture

Apr 16, 2019 · Big Data

Features, Configuration Parameters, and Implementation Details of Hadoop Capacity Scheduler

The article provides a comprehensive overview of Hadoop's Capacity Scheduler, describing its resource‑allocation features, configurable XML parameters, queue access controls, dynamic configuration updates, and the internal workflow of application initialization and resource scheduling within YARN.

CapacitySchedulerHadoopResourceManagement

0 likes · 13 min read

Features, Configuration Parameters, and Implementation Details of Hadoop Capacity Scheduler

Big Data Technology & Architecture

Apr 7, 2019 · Big Data

Understanding YARN: Background, Architecture, and Execution Process

This article explains why YARN was created to overcome the limitations of MapReduce 1.x, describes its architecture—including ResourceManager, NodeManager, ApplicationMaster, Container, and Client—and outlines the step‑by‑step execution flow that enables multiple computation frameworks to run on Hadoop.

Big DataDistributed ComputingHadoop

0 likes · 11 min read

Understanding YARN: Background, Architecture, and Execution Process

Big Data Technology & Architecture

Feb 26, 2019 · Big Data

Deploying Apache Flink Clusters: Standalone and YARN Modes

This guide explains how to set up an Apache Flink cluster on CentOS 7 using three deployment methods—Local, Standalone, and Flink on YARN/Kubernetes—including host configuration, SSH setup, package distribution, configuration file editing, cluster start/stop commands, YARN resource manager concepts, session commands, job submission, fault‑tolerance settings, and log inspection.

Big DataCluster DeploymentConfiguration

0 likes · 11 min read

Deploying Apache Flink Clusters: Standalone and YARN Modes

Youzan Coder

Feb 1, 2019 · Big Data

Design and Implementation of Log Parsing for a Big Data Offline Task Platform

The article describes a log‑parsing feature for Youzan’s big‑data offline platform that captures runtime logs from Hive, Spark, DataX, MapReduce and HBase jobs, categorizes scheduling types, extracts metrics such as read/write bytes, shuffle volume and GC time, and processes them in real time via a Filebeat‑Logstash‑Kafka‑Spark‑Streaming pipeline storing results in Redis for monitoring, optimization and resource‑usage ranking.

Big DataResource MonitoringYARN

0 likes · 7 min read

Design and Implementation of Log Parsing for a Big Data Offline Task Platform

Youzan Coder

Jan 16, 2019 · Big Data

How Youzan Scaled Real‑Time Analytics with Flink: Architecture, Pitfalls, and Lessons

This article walks through Youzan's real‑time platform architecture, explains why Flink was chosen over Spark Structured Streaming, details practical challenges such as container over‑provisioning and monitoring overhead, shares solutions for Spring integration and async caching, and outlines future directions for SQL‑based streaming and scheduler improvements.

Big DataFlinkReal-time Streaming

0 likes · 19 min read

How Youzan Scaled Real‑Time Analytics with Flink: Architecture, Pitfalls, and Lessons

Big Data Technology & Architecture

Jan 3, 2019 · Big Data

Deploying Apache Flink on YARN and Running Flink Jobs

This tutorial explains how to deploy Apache Flink on a Hadoop YARN cluster, covering both YARN session mode and direct job submission, and demonstrates running the built‑in WordCount example with command‑line options for input, output, and resource configuration.

Apache FlinkBig DataFlink Deployment

0 likes · 8 min read

Deploying Apache Flink on YARN and Running Flink Jobs

JD Tech

Jul 9, 2018 · Big Data

JD's Large‑Scale Hadoop Cluster Resource Management and Scheduling Architecture

This article describes how JD built a multi‑regional, ten‑thousand‑node Hadoop ecosystem, unified resource management with YARN, introduced a three‑level Router scheduling layer, optimized performance, and integrated deep‑learning frameworks to achieve high availability, cost efficiency, and scalable big‑data processing.

Distributed SchedulingHadoopJD.com

0 likes · 12 min read

JD's Large‑Scale Hadoop Cluster Resource Management and Scheduling Architecture

ITPUB

Jun 10, 2018 · Big Data

13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem

This article introduces Hadoop’s origins and core challenges, then presents thirteen essential open‑source tools spanning resource scheduling, real‑time query engines, and additional processing frameworks, detailing each project's purpose, key features, and repository locations to help practitioners choose the right component for big‑data workloads.

HadoopImpalaOpen-source

0 likes · 12 min read

13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem

ITPUB

Jun 4, 2018 · Big Data

Is Hadoop Really Declining? Expert Insights Show Why the Ecosystem Stays Strong

Despite Gartner's 2017 claim that Hadoop is nearing the end of its production maturity, a series of interviews with Chinese big‑data experts reveal that Hadoop's ecosystem remains robust, with core components like HDFS, YARN, Spark, and HBase continuing to dominate the market.

Big DataGartnerHadoop

0 likes · 9 min read

Is Hadoop Really Declining? Expert Insights Show Why the Ecosystem Stays Strong

ITPUB

May 31, 2018 · Big Data

Mastering Spark on DataMagic: Fast‑Track Your Big Data Skills

This article explains Spark's role in the DataMagic platform, outlines four practical steps to quickly master Spark, details key configuration and parallelism settings, shows how to modify Spark code, and provides operational tips for cluster management and job troubleshooting.

Big DataConfigurationDataMagic

0 likes · 10 min read

Mastering Spark on DataMagic: Fast‑Track Your Big Data Skills

21CTO

May 17, 2018 · Big Data

Understanding Hadoop MapReduce and YARN: Architecture, Shuffle, and Scaling

This article explains Hadoop's core components, the MapReduce programming model, the detailed shuffle and merge processes, and how YARN replaces the classic JobTracker/TaskTracker architecture to improve scalability and resource utilization in large‑scale data processing clusters.

Distributed ComputingHadoopShuffle

0 likes · 12 min read

Understanding Hadoop MapReduce and YARN: Architecture, Shuffle, and Scaling

Architects' Tech Alliance

May 14, 2018 · Big Data

Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization

This article explains Hadoop's distributed storage and processing framework, details the MapReduce programming model, describes the classic JobTracker/TaskTracker architecture, outlines the shuffle and combine phases, and introduces YARN as a scalable replacement with its ResourceManager, ApplicationMaster, and NodeManager components.

Big DataHadoopMapReduce

0 likes · 13 min read

Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization

Tencent Cloud Developer

Apr 12, 2018 · Big Data

Spark Usage in DataMagic Platform: A Practical Guide

This guide explains how DataMagic leverages Spark on YARN for fast, scalable offline analytics—covering Spark’s core role, four steps to master its terminology, configurations, parallelism, and code modification, plus practical deployment scripts, dynamic resource tuning, MongoDB export, job troubleshooting, and cluster upkeep for trillion‑record workloads.

DataMagicSparkSpark optimization

0 likes · 11 min read

Spark Usage in DataMagic Platform: A Practical Guide

dbaplus Community

Apr 7, 2018 · Cloud Native

What Makes Distributed Schedulers Tick? Patterns from YARN to Kubernetes

This article surveys the architecture of cluster resource managers and task schedulers—covering definitions, design principles, and three main categories (centralized, two‑level, and shared‑state) with concrete examples such as Hadoop YARN, Mesos, Spark Drizzle, Borg and Kubernetes—while highlighting their trade‑offs in scalability, fault‑tolerance, and flexibility.

KubernetesMesosOmega

0 likes · 27 min read

What Makes Distributed Schedulers Tick? Patterns from YARN to Kubernetes

ITPUB

Mar 29, 2018 · Big Data

Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture

This article explains Hadoop’s core components, the MapReduce programming model, the detailed shuffle and merge processes, and how YARN replaces the classic JobTracker/TaskTracker design to improve scalability and resource utilization in large‑scale data processing clusters.

Big DataHadoopMapReduce

0 likes · 15 min read

Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture

Beike Product & Technology

Mar 9, 2018 · Big Data

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

This article details Lianjia's journey of designing and implementing a low‑latency, stable real‑time computing platform using Spark Streaming on YARN, covering technical selection, architecture components, version compatibility challenges, exactly‑once semantics, graceful shutdown, Kafka tuning, and future enhancements.

Big DataExactly-onceKafka

0 likes · 11 min read

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

58 Tech

Feb 7, 2018 · Frontend Development

ArthurCI: Accelerating Frontend Continuous Integration with Stable Infrastructure

The article introduces ArthurCI, a front‑end continuous‑integration platform developed by 58, detailing its design, performance optimizations such as yarn caching and parallel webpack compression, ease‑of‑use integration steps, stability features, and future data‑driven enhancements, while comparing it with tools like TravisCI.

CIContinuous IntegrationYARN

0 likes · 9 min read

ArthurCI: Accelerating Frontend Continuous Integration with Stable Infrastructure

dbaplus Community

Jan 17, 2018 · Big Data

Mastering Hadoop YARN: CPU & Memory Management Strategies for Large‑Scale Clusters

This article explores Hadoop YARN’s evolution, multi‑tenant design, queue and node‑label scheduling, real‑world resource allocation challenges, and data‑driven tools that automate diagnostics and visualizations to optimize CPU and memory usage across massive clusters.

CPUHadoopNode Labels

0 likes · 16 min read

Mastering Hadoop YARN: CPU & Memory Management Strategies for Large‑Scale Clusters

Full-Stack DevOps & Kubernetes

Oct 21, 2017 · Big Data

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

This guide walks through preparing three CentOS 6.9 nodes, configuring hostnames, time sync, password‑less SSH, disabling IPv6, installing JDK, downloading CDH 5.4, setting up core‑site and hdfs‑site XML files, formatting the NameNode, starting HDFS services, configuring YARN and MapReduce, and verifying the installations via the Web UI.

Big DataCDHCentOS

0 likes · 18 min read

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

dbaplus Community

Sep 26, 2017 · Big Data

How to Avoid Common Spark SQL Pitfalls and Boost Performance

This article shares a comprehensive set of practical tips and solutions for common Spark SQL issues—including out‑of‑memory errors, UDF‑induced GC, thread blocking, system‑property initialization, speculation side‑effects, accumulator traps, concurrent job scheduling, and excessive logging—helping engineers improve stability and efficiency of their Spark‑based financial systems.

AccumulatorMemory ManagementPerformance Tuning

0 likes · 15 min read

How to Avoid Common Spark SQL Pitfalls and Boost Performance

Ctrip Technology

Sep 20, 2017 · Big Data

Building a Real‑Time Computing Platform with Spark Streaming at Ctrip: Design, Implementation, and Lessons Learned

This article describes how Ctrip migrated its large‑scale real‑time platform from JStorm to Spark Streaming, detailing the architectural design, the Muise Spark Core encapsulation, operational metrics, encountered pitfalls, and future plans to adopt Flink and Beam for streaming workloads.

Big DataExactly-onceSpark Streaming

0 likes · 22 min read

Building a Real‑Time Computing Platform with Spark Streaming at Ctrip: Design, Implementation, and Lessons Learned

Architecture Digest

Jul 21, 2017 · Big Data

Step-by-Step Guide to Building a High-Availability Hadoop HDFS and YARN Cluster

This article provides a comprehensive, step-by-step tutorial for setting up a high‑availability Hadoop cluster, covering user creation, JDK installation, host configuration, SSH setup, firewall and SELinux adjustments, Zookeeper deployment, HDFS and YARN HA configuration, essential XML files, and failover testing.

Cluster SetupHDFSHadoop

0 likes · 20 min read

Step-by-Step Guide to Building a High-Availability Hadoop HDFS and YARN Cluster

Node Underground

Jul 20, 2017 · Frontend Development

How to Build a Minimal Package Manager from Scratch

This article explains why package managers are essential, showcases Yarn's step‑by‑step tutorial for creating a simple package manager, and highlights how the resulting tool handles classic challenges like circular dependencies and file‑structure optimization.

Software DevelopmentYARNdependency resolution

0 likes · 2 min read

How to Build a Minimal Package Manager from Scratch

Tencent Music Tech Team

Jun 23, 2017 · Backend Development

New Features and Changes in npm@5: Detailed Overview and Comparison with Yarn

npm 5 introduces automatic package‑lock generation, default --save, enhanced Git and file‑dependency handling, new prepack/postpack scripts, stronger integrity checks, a fully managed cache and registry tweaks, while narrowing Yarn’s speed advantage despite early bugs, making it a compelling alternative for npm‑centric workflows.

YARNdependency managementnpm

0 likes · 15 min read

New Features and Changes in npm@5: Detailed Overview and Comparison with Yarn

Node Underground

Jun 22, 2017 · Backend Development

8 Essential Node.js Practices Every Backend Developer Should Follow

This article presents eight practical recommendations for Node.js developers, covering dependency locking, lifecycle scripts, modern JavaScript, promises with async/await, code formatting with Prettier, continuous integration testing, security headers via Helmet, and serving over HTTPS.

HTTPSNode.jsPrettier

0 likes · 4 min read

8 Essential Node.js Practices Every Backend Developer Should Follow

Qunar Tech Salon

Mar 14, 2017 · Backend Development

Node.js 2016 Review, Applications, and 2017 Outlook

This article reviews the major Node.js events of 2016—including version updates, the left‑pad controversy, Yarn, Chrome DevTools debugging, and ecosystem tools—describes common application scenarios and framework selection criteria, and offers predictions for Node.js development in 2017.

Node.jsTypeScriptYARN

0 likes · 17 min read

Node.js 2016 Review, Applications, and 2017 Outlook

Efficient Ops

Mar 8, 2017 · Big Data

Inside iQIYI’s Massive Hadoop Platform: Architecture, Ops, and the Gear Workflow Engine

iQIYI’s Hadoop platform, built since 2010, now spans over a thousand nodes and 60 PB storage, detailing its architectural evolution, operational management practices, encountered challenges, and the custom Gear workflow system that streamlines job scheduling, dependencies, and alerts for large‑scale data processing.

GearHadoopYARN

0 likes · 19 min read

Inside iQIYI’s Massive Hadoop Platform: Architecture, Ops, and the Gear Workflow Engine

StarRing Big Data Open Lab

Feb 24, 2017 · Big Data

Mastering Inceptor Server HA: Configuration, Failover, and Best Practices

This article provides a comprehensive guide to Inceptor Server HA, covering its high‑availability architecture, configuration steps, required parameters, connection strings, failover behavior, and the HA Tools utility for monitoring and managing master‑standby switches.

HAHigh AvailabilityHiveServer2

0 likes · 11 min read

Mastering Inceptor Server HA: Configuration, Failover, and Best Practices

Java High-Performance Architecture

Nov 23, 2016 · Big Data

How to Achieve YARN ResourceManager High Availability with Zookeeper

YARN’s ResourceManager is a single point of failure, so this guide explains how to configure active/standby mode using Zookeeper, covering leader election, automatic failover, handling false‑dead scenarios, and the essential Zookeeper features such as temporary nodes, watches, and ACLs.

ResourceManagerYARNZookeeper

0 likes · 5 min read

How to Achieve YARN ResourceManager High Availability with Zookeeper

CSS Magic

Oct 13, 2016 · Frontend Development

Yarn Explained: Facebook’s Faster, Safer JavaScript Package Manager

The article details how Facebook built Yarn to overcome npm’s consistency, security, and speed limitations, describing the evolution of their package‑management workflow, Yarn’s lockfile architecture, parallel installation process, additional features, production adoption, and simple commands to get started.

JavaScriptYARNfrontend

0 likes · 13 min read

Yarn Explained: Facebook’s Faster, Safer JavaScript Package Manager

Node Underground

Oct 12, 2016 · Frontend Development

Why Yarn Is Changing JavaScript Package Management: Speed, Versioning, and the npm Rivalry

The article examines Yarn, Facebook's new JavaScript package manager, highlighting its focus on speed and version control, its compatibility with npm and Bower, the mixed reactions from developers, and the uncertainty about its long‑term impact on the ecosystem.

YARNdependency managementnpm

0 likes · 2 min read

Why Yarn Is Changing JavaScript Package Management: Speed, Versioning, and the npm Rivalry

MaGe Linux Operations

Aug 23, 2016 · Big Data

Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

This article provides a comprehensive, hands‑on tutorial for setting up a Hadoop 2.6.4 cluster on a CentOS 6.5 development server, covering SSH password‑less login, user/group creation, DNS configuration, JDK installation, environment variables, Hadoop installation, HDFS and YARN configuration, and troubleshooting native library warnings.

Big DataCentOSCluster Setup

0 likes · 12 min read

Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

MaGe Linux Operations

Aug 4, 2016 · Big Data

How Hadoop 2.0 Collects and Manages Job Logs with YARN

This article explains Hadoop 2.0's built‑in MRv2 log collection mechanism, detailing job‑run and task‑run logs, their generation steps, log aggregation, and the role of the JobHistory Server for centralized analysis.

Big DataHadoopJobHistory

0 likes · 8 min read

How Hadoop 2.0 Collects and Manages Job Logs with YARN

Hulu Beijing

May 31, 2016 · Big Data

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Hadoop 3.0, built on JDK 1.8, adds erasure‑coded HDFS, multi‑NameNode support, native MapReduce task optimizations, cgroup‑based YARN memory and disk isolation, and container resizing, with an alpha slated for summer and a GA release expected in November or December.

Big DataHDFSHadoop

0 likes · 5 min read

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Architecture Digest

May 4, 2016 · Big Data

Upgrading Spark from 1.4.1 to 1.6.1: Memory, Storage, and Operational Challenges

The article details the author’s experience upgrading a production Spark cluster from version 1.4.1 to 1.6.1, exposing memory‑spill, unified memory, BlockManager deadlock, Yarn‑kill, UI quirks, and Spark‑SQL compatibility issues, and proposes concrete code‑level fixes for each problem.

Big DataDistributed ComputingMemory Management

0 likes · 14 min read

Upgrading Spark from 1.4.1 to 1.6.1: Memory, Storage, and Operational Challenges

21CTO

Apr 18, 2016 · Big Data

How Spark Runs on YARN: From Client Submission to Executor Execution

This article explains the end‑to‑end workflow of Spark on YARN, covering client initialization, ApplicationMaster actions, driver and executor roles, RDD fundamentals, SparkSQL processing, and practical code examples for building and tuning distributed Spark jobs.

Distributed ComputingRDDSpark

0 likes · 17 min read

How Spark Runs on YARN: From Client Submission to Executor Execution

Architecture Digest

Apr 18, 2016 · Big Data

Introduction to Apache Spark: Architecture, RDD, Spark on YARN, and SparkSQL

This article introduces Apache Spark’s core architecture, explains how Spark runs on YARN, details driver and executor roles, describes RDD concepts and dependencies, and outlines SparkSQL’s schema‑based query processing, providing code examples for HiveContext and JDBC integration.

Big DataDistributed ComputingRDD

0 likes · 14 min read

Introduction to Apache Spark: Architecture, RDD, Spark on YARN, and SparkSQL

21CTO

Mar 30, 2016 · Big Data

Unveiling Spark on YARN: From RDD Basics to Cluster Execution

This article explains Apache Spark’s core concepts, the RDD programming model, how Spark runs on YARN with driver and executor nodes, the distinction between transformations and actions, partitioning strategies, and an overview of SparkSQL processing.

Apache SparkRDDSparkSQL

0 likes · 18 min read

Unveiling Spark on YARN: From RDD Basics to Cluster Execution

ITPUB

Feb 24, 2016 · Big Data

How Pepperdata Optimizes Hadoop Cluster Resources and Improves Performance

The article explains how Hadoop clusters suffer from resource contention among multiple users, why YARN alone often fails to prioritize workloads, and how Pepperdata provides deeper visibility and automatic adjustments that reduce low‑priority usage, cut node count, and lower cloud costs.

Big DataHadoopPepperdata

0 likes · 7 min read

How Pepperdata Optimizes Hadoop Cluster Resources and Improves Performance

Art of Distributed System Architecture Design

Oct 29, 2015 · Big Data

TalkingData’s Journey to Building a Mobile Big Data Platform with Spark and YARN

This article recounts how TalkingData progressively introduced Spark into its Hadoop‑YARN based mobile big‑data platform, detailing early architectures, migration challenges, performance gains, the fully Spark‑centric redesign with Kafka and Spark Streaming, encountered pitfalls, and future plans for further optimization.

Data PlatformHadoopSpark

0 likes · 16 min read

TalkingData’s Journey to Building a Mobile Big Data Platform with Spark and YARN

Hulu Beijing

Aug 14, 2015 · Big Data

How Voidbox Bridges Docker and YARN for Scalable Big Data Workloads

Voidbox integrates Docker containers with YARN to simplify distributed application development, improve deployment, boost cluster efficiency, and provide fault‑tolerant, DAG‑based execution modes, enabling seamless resource management for Hadoop‑based big data jobs.

Big DataCluster ComputingDAG

0 likes · 17 min read

How Voidbox Bridges Docker and YARN for Scalable Big Data Workloads

MaGe Linux Operations

Nov 5, 2014 · Big Data

Quickly Get Hadoop 2.0 Up and Running: A Minimal Configuration Guide

This article walks through the essential steps to install and configure Hadoop 2.0 on a two‑node Linux cluster, covering version selection, directory setup, core XML files, YARN settings, service startup, verification commands, and basic troubleshooting tips.

Big DataCluster SetupHDFS

0 likes · 9 min read

Quickly Get Hadoop 2.0 Up and Running: A Minimal Configuration Guide