Tagged articles
23 articles
Page 1 of 1
DataFunTalk
DataFunTalk
Jan 14, 2024 · Big Data

Optimizing Object Storage and Impala Engine in NetEase NDH: Performance Enhancements and Feature Additions

This presentation outlines NetEase's NDH big‑data platform, detailing its background, object‑storage upload and rename optimizations, Impala engine adaptations—including file‑handle caching, transparent URI handling, and getFileBlockLocations improvements—and a suite of operational enhancements such as dynamic proxy user configuration and audit‑log extensions.

AlluxioBig DataImpala
0 likes · 14 min read
Optimizing Object Storage and Impala Engine in NetEase NDH: Performance Enhancements and Feature Additions
DataFunSummit
DataFunSummit
Dec 5, 2022 · Big Data

Impala Cluster Performance Optimization Based on Historical Queries: Practices and Solutions

This article presents a comprehensive overview of Impala cluster performance optimization using historical query analysis, covering background, high‑performance data‑warehouse construction principles, identified pain points, HBO implementation details, optimization techniques, and future development plans for the Impala ecosystem.

Big DataHBOHistorical Queries
0 likes · 16 min read
Impala Cluster Performance Optimization Based on Historical Queries: Practices and Solutions
DataFunSummit
DataFunSummit
Sep 24, 2022 · Big Data

Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks

The article details how 37 Mobile Games built and continuously evolved a multi-dimensional analytics platform—covering business background, data challenges, the migration from MySQL through Druid, Impala, ClickHouse to StarRocks, self‑service data tools, monitoring, and future roadmap—highlighting technical decisions and lessons learned.

ClickHouseData WarehouseImpala
0 likes · 20 min read
Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks
DataFunSummit
DataFunSummit
Apr 9, 2022 · Big Data

Impala Deployment and Optimization: Practical Experience with Sensor Data Multi‑dimensional Analysis Platform

This article presents a comprehensive technical walkthrough of Sensor Data's multi‑dimensional analysis platform, covering product architecture, an Impala‑based real‑time query engine, query performance tuning, resource‑estimation strategies, and future plans, with concrete diagrams, test results, and community contributions.

Big DataData ArchitectureImpala
0 likes · 19 min read
Impala Deployment and Optimization: Practical Experience with Sensor Data Multi‑dimensional Analysis Platform
DataFunTalk
DataFunTalk
Apr 4, 2022 · Big Data

Impala Deployment and Optimization in Sensors Data's Multi-Dimensional Analytics Platform

This article details the architecture of Sensors Data's analytics platform, the implementation of a real‑time Impala query engine, multiple query‑performance optimizations—including storage redesign, user‑behavior sequence tuning, join elimination and expression push‑down—and a resource‑estimation framework that dramatically reduces query failures and latency.

Big DataData PlatformImpala
0 likes · 16 min read
Impala Deployment and Optimization in Sensors Data's Multi-Dimensional Analytics Platform
DataFunTalk
DataFunTalk
Oct 7, 2021 · Big Data

Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios

This article introduces Impala's overall architecture, storage options, key features, concurrency mechanisms, CBO‑based join optimization techniques, storage‑layer principles and data‑filtering strategies, and summarizes practical performance‑tuning experiences from Tencent's financial big‑data platform.

Big DataCBOImpala
0 likes · 12 min read
Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios
Tencent Tech
Tencent Tech
Sep 10, 2021 · Big Data

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

This article details how Sohu Changyou’s data team, together with Tencent Cloud engineers, planned and executed a seamless migration of over one petabyte of game data to Elastic MapReduce, Elasticsearch Service and Oceanus, achieving zero service impact and dramatically improving analytics performance.

Big DataEMRGame Analytics
0 likes · 9 min read
How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime
DataFunTalk
DataFunTalk
Feb 14, 2021 · Big Data

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

This talk presents NetEase's practical experience with Impala, covering its core architecture, new features in version 3.x, integration with Apache Iceberg, a custom management platform, profiling and statistics enhancements, as well as future plans involving Kubernetes, Alluxio caching and pre‑computation strategies.

Apache IcebergBig DataCluster Management
0 likes · 13 min read
Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap
DataFunTalk
DataFunTalk
Oct 19, 2020 · Big Data

Impala Optimization and Practices at NetEase Big Data Platform

This article presents a comprehensive overview of NetEase's use of Impala as an OLAP query engine, detailing its architectural advantages, performance benefits, enhancements such as management servers, metadata synchronization, high‑availability via Zookeeper, expanded storage support, and real‑world deployment cases in the "Mammoth" platform and NetEase Cloud Music.

ImpalaMetadata SyncOLAP
0 likes · 11 min read
Impala Optimization and Practices at NetEase Big Data Platform
DataFunTalk
DataFunTalk
Sep 17, 2020 · Big Data

Design and Implementation of a Scalable User Tag Production Platform

The article explains how a flexible, high‑performance user‑tagging system is built on a batch‑stream integrated architecture using big‑data technologies such as Impala, HDFS, and Flink to support both offline and real‑time label generation for precise marketing, product improvement, and operational analytics.

Big DataFlinkImpala
0 likes · 15 min read
Design and Implementation of a Scalable User Tag Production Platform
Big Data Technology Architecture
Big Data Technology Architecture
Feb 3, 2020 · Big Data

NetEase Data Foundation Platform Construction – Technical Sharing

This article, originally shared by NetEase’s data expert Jiang Hongxiang on DataFun, outlines the construction of NetEase’s data foundation platform, covering database kernel insights and the implementation of the ad‑hoc query engine Impala with the distributed storage system Kudu, offering valuable big‑data engineering practices.

Data PlatformImpalaKudu
0 likes · 4 min read
NetEase Data Foundation Platform Construction – Technical Sharing
DataFunTalk
DataFunTalk
Feb 18, 2019 · Big Data

Hulu’s Big Data Architecture and Sophon OLAP Cache Layer Overview

This article presents an in‑depth overview of Hulu’s big‑data platform, detailing its multi‑layer architecture, the design and functionality of the Sophon OLAP cache layer, and how Impala is employed for high‑performance query processing and integration with cloud‑native engines.

Data ArchitectureHuluImpala
0 likes · 16 min read
Hulu’s Big Data Architecture and Sophon OLAP Cache Layer Overview
DataFunTalk
DataFunTalk
Jan 16, 2019 · Big Data

NetEase Data Infrastructure: Database Technologies and Big Data Platform Overview

This article presents NetEase Hangzhou Research Institute's experience in building a data infrastructure, covering database innovations such as InnoSQL, NTSDB, and InnoRocks, as well as the integration of big‑data components like HDFS, Spark, Impala, and Kudu to enable efficient storage, processing, and real‑time analytics.

Data PlatformImpalaInnoSQL
0 likes · 12 min read
NetEase Data Infrastructure: Database Technologies and Big Data Platform Overview
ITPUB
ITPUB
Jun 10, 2018 · Big Data

13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem

This article introduces Hadoop’s origins and core challenges, then presents thirteen essential open‑source tools spanning resource scheduling, real‑time query engines, and additional processing frameworks, detailing each project's purpose, key features, and repository locations to help practitioners choose the right component for big‑data workloads.

HadoopImpalaSpark
0 likes · 12 min read
13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem
Meituan Technology Team
Meituan Technology Team
Aug 5, 2016 · Big Data

Design and Implementation of a Large-Scale User Behavior Analytics Platform

The article outlines Meituan‑Dianping’s “Sensors Analytics” platform, a privately‑deployed, open‑PaaS solution that collects full‑stack user events from iOS, Android, Web and WeChat, maps IDs in near real‑time, stores detailed records in Kudu (real‑time) and Parquet (offline), and serves low‑latency queries via Impala, addressing the architectural and operational challenges of high‑throughput ingestion and data‑security requirements.

ImpalaKafkaKudu
0 likes · 8 min read
Design and Implementation of a Large-Scale User Behavior Analytics Platform