ByteDance Data Platform
Author

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

78
Articles
0
Likes
187
Views
0
Comments
Recent Articles

Latest from ByteDance Data Platform

78 recent articles
ByteDance Data Platform
ByteDance Data Platform
May 16, 2022 · Operations

How ByteDance’s SLA Assurance Platform Guarantees Data Reliability at Scale

This article explains how ByteDance’s self‑built SLA assurance platform addresses data pipeline communication costs, unclear responsibilities, and operational pressure by introducing roles, a streamlined signing workflow, checkpoint and recommendation calculations, and real‑time monitoring to achieve a 99.1% SLA compliance rate.

MonitoringSLAoperations
0 likes · 9 min read
How ByteDance’s SLA Assurance Platform Guarantees Data Reliability at Scale
ByteDance Data Platform
ByteDance Data Platform
May 11, 2022 · Big Data

How to Build a High‑Performance SparkSQL Server with Hive JDBC Compatibility

This article explains how to design and implement a SparkSQL server that lowers usage barriers and boosts efficiency by supporting standard JDBC interfaces, integrating Hive Server2 protocols, handling multi‑tenant authentication, managing Spark job lifecycles, and providing high‑availability through Zookeeper coordination.

HiveJDBCServer Architecture
0 likes · 15 min read
How to Build a High‑Performance SparkSQL Server with Hive JDBC Compatibility
ByteDance Data Platform
ByteDance Data Platform
Apr 27, 2022 · Big Data

How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans

ByteDance’s Data Catalog article details the system’s unified metadata model, standardized ingestion connectors, search optimization techniques, lineage capabilities, and storage layer enhancements, highlighting key technical designs, performance improvements, and future work to advance data governance and asset utilization.

Data CatalogSearch Optimizationdata lineage
0 likes · 12 min read
How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans
ByteDance Data Platform
ByteDance Data Platform
Apr 15, 2022 · Cloud Native

How ByteHouse Evolved From ClickHouse Into a Next‑Gen Cloud‑Native Data Warehouse

ByteHouse, born from ByteDance’s extensive use of ClickHouse, transformed a high‑performance OLAP engine into a cloud‑native, scalable data warehouse by addressing scalability, elasticity, high availability, and multi‑tenant challenges through architectural redesign, custom storage layers, and advanced metadata management.

ByteHouseClickHouseData Warehouse
0 likes · 19 min read
How ByteHouse Evolved From ClickHouse Into a Next‑Gen Cloud‑Native Data Warehouse
ByteDance Data Platform
ByteDance Data Platform
Feb 25, 2022 · Big Data

Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server

ByteDance’s EMR team details how they integrated data‑lake engines such as Hudi and Iceberg into SparkSQL, streamlined jar management, built a custom Spark SQL Server with Hive compatibility, multi‑tenant support, engine pre‑warming, and transaction capabilities, dramatically improving performance and resource efficiency for enterprise workloads.

EMRHudiIceberg
0 likes · 11 min read
Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server
ByteDance Data Platform
ByteDance Data Platform
Feb 21, 2022 · Big Data

Choosing the Right Components for Enterprise Data Warehouses: Hive vs SparkSQL

This article examines how to design enterprise‑grade data warehouses by evaluating development convenience, ecosystem, decoupling, performance and security, compares Hive and SparkSQL along with other engines such as Presto, Doris and ClickHouse, and outlines best‑practice component selections for long‑running batch and interactive analytics.

ArchitectureData WarehouseETL
0 likes · 19 min read
Choosing the Right Components for Enterprise Data Warehouses: Hive vs SparkSQL