Tag

Hadoop

0 views collected around this technical thread.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 26, 2024 · Big Data

Understanding Hadoop HDFS and MapReduce: Principles, Architecture, and Sample Code

This article explains the origins of big‑data technologies, details the architecture and read/write mechanisms of Hadoop's HDFS, describes the MapReduce programming model, and provides complete Java code examples for a simple distributed file‑processing job using Maven dependencies.

Big DataHDFSHadoop
0 likes · 15 min read
Understanding Hadoop HDFS and MapReduce: Principles, Architecture, and Sample Code
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 23, 2024 · Big Data

Implementing a Basic Hadoop MapReduce Word Count with Extensible Design and Performance Tuning

This article explains Hadoop’s core concepts using a library analogy, details HDFS storage and MapReduce processing, provides complete Java implementations for a word‑count job with support for text, CSV, and JSON inputs, and discusses extensibility and performance optimizations such as combiners and custom partitioners.

Big DataHadoopJava
0 likes · 20 min read
Implementing a Basic Hadoop MapReduce Word Count with Extensible Design and Performance Tuning
DataFunSummit
DataFunSummit
Aug 13, 2024 · Big Data

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

This article presents Qichacha's comprehensive data‑cost‑reduction strategy, detailing its Hadoop‑based three‑pillar architecture, layered data warehouse, Hive upgrades, unified metadata across multi‑cloud clusters, middleware choices such as Alluxio and JuiceFS, version‑compatible hybrid clouds, and Kubernetes‑driven resource orchestration to achieve scalable, low‑cost data processing.

Big DataData WarehouseHadoop
0 likes · 16 min read
Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design
360 Smart Cloud
360 Smart Cloud
May 28, 2024 · Big Data

HDFS Upgrade from 2.6.0‑cdh to 3.1.2 with DataNode Federation and Mixed Deployment

This article details the background, planning, step‑by‑step procedures, encountered issues, and rollback strategies for upgrading a Hadoop HDFS cluster from version 2.6.0‑cdh to 3.1.2, including mixed‑deployment of DataNodes across different federations and necessary configuration changes.

BigDataClusterDataNode
0 likes · 16 min read
HDFS Upgrade from 2.6.0‑cdh to 3.1.2 with DataNode Federation and Mixed Deployment
DataFunTalk
DataFunTalk
May 19, 2024 · Big Data

Tencent's Multi-Engine Unified Metadata and Permission Management for Big Data

This article introduces Tencent's Big Data Processing Suite (TBDS), discusses challenges of data silos, and presents Gravitino's open‑source unified metadata service and permission model, detailing how it integrates Hadoop, MPP, and various catalog plugins to provide consistent access control across heterogeneous data platforms.

Access ControlBig DataGravitino
0 likes · 12 min read
Tencent's Multi-Engine Unified Metadata and Permission Management for Big Data
DataFunSummit
DataFunSummit
Apr 26, 2024 · Big Data

Didi's Big Data Cost Governance Practices

This article details Didi's comprehensive big data cost governance framework, covering its data architecture, asset management scoring, Hadoop and Elasticsearch cost optimization methods, and practical insights on organizational processes and incentives for effective cost control.

Big DataElasticsearchHadoop
0 likes · 17 min read
Didi's Big Data Cost Governance Practices
vivo Internet Technology
vivo Internet Technology
Apr 24, 2024 · Big Data

Analysis and Resolution of a FileSystem‑Induced Memory Leak Causing OOM in Production

The article details how repeatedly calling FileSystem.get(uri, conf, user) created distinct UserGroupInformation objects, inflating the static FileSystem cache and causing a heap‑memory leak that triggered an Out‑Of‑Memory error, and explains that using the two‑argument get method or explicitly closing instances resolves the issue.

FilesystemHadoopJava
0 likes · 13 min read
Analysis and Resolution of a FileSystem‑Induced Memory Leak Causing OOM in Production
Efficient Ops
Efficient Ops
Apr 23, 2024 · Big Data

How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes

This guide walks through planning a three‑node Hadoop 3.3.5 cluster, explains default and custom configuration files, details core‑site, hdfs‑site, yarn‑site, and mapred‑site settings, shows how to distribute configs, start HDFS and YARN, and perform basic file‑system tests.

Big DataCluster SetupHDFS
0 likes · 11 min read
How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes
DataFunTalk
DataFunTalk
Mar 24, 2024 · Big Data

Didi's Big Data Asset Governance Practices: Hadoop and Elasticsearch Governance

This article details Didi's comprehensive big‑data asset governance platform, covering its architectural layers, Hadoop and Elasticsearch governance practices, health‑score models, lifecycle recommendations, and future plans for automated and intelligent data governance to reduce cost and manual effort.

Asset ManagementBig DataElasticsearch
0 likes · 17 min read
Didi's Big Data Asset Governance Practices: Hadoop and Elasticsearch Governance
360 Smart Cloud
360 Smart Cloud
Jan 15, 2024 · Big Data

Design and Optimization of the Ozone Distributed Object Storage System

This article presents a comprehensive overview of Ozone, a Hadoop‑based distributed object storage system, detailing its architecture, metadata management, scalability enhancements, small‑file handling, erasure coding, lifecycle policies, and future improvements aimed at boosting performance and reliability for large‑scale unstructured data workloads.

Big DataHadoopObject Storage
0 likes · 15 min read
Design and Optimization of the Ozone Distributed Object Storage System
DataFunTalk
DataFunTalk
Jan 8, 2024 · Big Data

Didi's Big Data Cost Governance Practices and Framework

This article presents Didi's comprehensive big data cost governance approach, detailing the overall framework, data system architecture, asset management platform, Hadoop and Elasticsearch cost‑control practices, metadata‑driven optimization, and organizational insights for effective resource and budget management.

Big DataElasticsearchHadoop
0 likes · 19 min read
Didi's Big Data Cost Governance Practices and Framework
Architects Research Society
Architects Research Society
Jan 2, 2024 · Big Data

Understanding Data Lakes: Concepts, Benefits, Challenges, and Comparison with Data Warehouses

This article explains what a data lake is, its origins, key characteristics such as collecting all data, enabling diverse user access, and flexible processing, compares it with traditional data warehouses, discusses cost advantages, potential pitfalls like data swamps, and outlines best‑practice considerations for enterprise adoption.

Big DataData WarehouseHadoop
0 likes · 10 min read
Understanding Data Lakes: Concepts, Benefits, Challenges, and Comparison with Data Warehouses
Architects Research Society
Architects Research Society
Nov 26, 2023 · Big Data

Data Lake vs Data Warehouse: Key Differences and How to Choose

Data lakes and data warehouses serve different purposes in big‑data architectures; this article explains their definitions, core attributes, five major distinctions—including data retention, type support, user coverage, adaptability, and insight speed—and offers guidance on selecting or combining the two approaches.

Big DataData WarehouseHadoop
0 likes · 12 min read
Data Lake vs Data Warehouse: Key Differences and How to Choose
WeiLi Technology Team
WeiLi Technology Team
Nov 1, 2023 · Big Data

How to Diagnose and Resolve HDFS Safe Mode Issues

This guide explains why HDFS enters safe mode after a DataNode failure, describes the safe‑mode state and its exit conditions, and provides step‑by‑step commands and troubleshooting procedures to analyze, fix, and recover from safe‑mode incidents in Hadoop clusters.

Big DataCluster ManagementHDFS
0 likes · 10 min read
How to Diagnose and Resolve HDFS Safe Mode Issues
DevOps
DevOps
Oct 25, 2023 · Big Data

An Introduction to Big Data: Origins, Definitions, 5V Characteristics, Applications, Hadoop Architecture, and Testing Strategies

This article provides a comprehensive overview of big data, covering its origins, definitions, 5V characteristics, data formats, real‑world applications, Hadoop architecture, testing challenges, functional and performance testing strategies, and the skills required for effective big data testing.

5V CharacteristicsBig DataData Formats
0 likes · 35 min read
An Introduction to Big Data: Origins, Definitions, 5V Characteristics, Applications, Hadoop Architecture, and Testing Strategies
政采云技术
政采云技术
Aug 23, 2023 · Big Data

Step-by-Step Guide to Building a Hadoop Big Data Cluster on ARM Architecture

This comprehensive tutorial details the process of deploying a complete Hadoop-based big data ecosystem on ARM architecture, covering the installation and configuration of essential components including Java, Zookeeper, Hadoop, MySQL, Hive, and Spark with practical code examples.

ARM architectureBig DataCluster Deployment
0 likes · 19 min read
Step-by-Step Guide to Building a Hadoop Big Data Cluster on ARM Architecture
DataFunTalk
DataFunTalk
Jun 9, 2023 · Big Data

Cloud Music Data Governance Practice

This article presents a comprehensive case study of NetEase Cloud Music's data governance practice, covering data background, governance philosophy, detailed solutions across metadata, storage, compute, and model design, practical implementations, measurable cost savings, and future planning for sustainable data management.

Big DataCost OptimizationData Warehouse
0 likes · 15 min read
Cloud Music Data Governance Practice
Big Data Technology Architecture
Big Data Technology Architecture
Mar 15, 2023 · Big Data

Ensuring Secure Write Paths in Hadoop S3A: Experiments, Benchmarks, and Best Practices

This article analyses the security of Hadoop S3A write paths in data lakes, explains fast upload mechanisms, demonstrates disk‑IO and network‑error simulations, compares checksum algorithms, and presents Alibaba Cloud EMR JindoSDK best‑practice results with performance and reliability evaluations.

Big DataHadoopS3A
0 likes · 24 min read
Ensuring Secure Write Paths in Hadoop S3A: Experiments, Benchmarks, and Best Practices
DataFunSummit
DataFunSummit
Feb 6, 2023 · Product Management

Key Capabilities and Knowledge for Platform Data Product Managers in the Big Data Era

This article outlines the evolution of big data, defines the role of platform data product managers, details their core competencies—including general, professional thinking, and technical skills—covers the Hadoop ecosystem, and explains the end‑to‑end offline data‑warehouse construction process with practical examples and Q&A.

Big DataData Product ManagementHadoop
0 likes · 12 min read
Key Capabilities and Knowledge for Platform Data Product Managers in the Big Data Era
DataFunSummit
DataFunSummit
Dec 31, 2022 · Big Data

The Evolution of Data Platforms: From Early Computing to the Modern Big Data Stack

This article reviews the history of data platforms—from the first general‑purpose computers and early relational databases through traditional BI, agile BI, and big‑data technologies like Hadoop, Spark, and Flink, up to today’s cloud‑native modern data stack and its future outlook.

Big DataData WarehouseFlink
0 likes · 26 min read
The Evolution of Data Platforms: From Early Computing to the Modern Big Data Stack