Tag

data warehouse

0 views collected around this technical thread.

Sohu Tech Products
Sohu Tech Products
Jun 11, 2025 · Big Data

How We Transformed a Microservice Finance System into a Scalable Big Data Warehouse

This article details the evolution of a fast‑growing finance reporting system from a monolithic microservice architecture plagued by data inconsistency, low efficiency, and scalability limits to a robust, high‑performance big‑data warehouse built with layered data models, SparkSQL processing, and unified scheduling, highlighting design decisions, technical trade‑offs, and measurable performance gains.

Big DataMicroservicesarchitecture evolution
0 likes · 23 min read
How We Transformed a Microservice Finance System into a Scalable Big Data Warehouse
Didi Tech
Didi Tech
Mar 20, 2025 · Big Data

Key Questions and Value Assessment in Data Warehouse Modeling and Development

The article explores nine fundamental questions about data‑warehouse modeling—why and when to model, how to evaluate and compare models, the warehouse’s unique role versus business systems, modern architectural shifts, a quantitative value‑proof scoring framework, industry‑standard versus custom approaches, demonstrating business impact, and career insights—concluding that true value lies in enabling informed decisions rather than technology hype.

AIBig DataData Modeling
0 likes · 12 min read
Key Questions and Value Assessment in Data Warehouse Modeling and Development
JD Tech Talk
JD Tech Talk
Dec 26, 2024 · Databases

Using ClickHouse for Efficient Tag Bitmap Storage and Group Computation in a CDP

This article explains how ClickHouse’s columnar storage, bitmap functions, and distributed architecture can be leveraged to store billions of tag bitmaps, combine them efficiently, and support fast group calculations for customer data platforms, while addressing data‑warehouse integration, storage format, and performance challenges.

BitmapCDPOLAP
0 likes · 10 min read
Using ClickHouse for Efficient Tag Bitmap Storage and Group Computation in a CDP
DataFunSummit
DataFunSummit
Nov 29, 2024 · Big Data

Standardizing Metric Management in Didi’s Data Platform

The article outlines Didi’s end‑to‑end metric lifecycle—from background, requirements and current pain points to a multi‑stage solution that introduces a unified metric dictionary, management tool, logical modeling, and consumption layer—to achieve accurate, timely, consistent, and efficiently managed indicators across the data warehouse ecosystem.

Big DataData ModelingStandardization
0 likes · 20 min read
Standardizing Metric Management in Didi’s Data Platform
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 25, 2024 · Big Data

Design and Evolution of a R&D Measurement Platform: Architecture, Data Governance, and Interactive Analytics

This article details the purpose, technical evolution, architecture, data‑source unification, dimensional modeling, data‑warehouse layering, SQL‑as‑metric approach, and interactive design of a measurement platform built to improve R&D efficiency through systematic data collection and visualization.

BIData GovernancePlatform Architecture
0 likes · 32 min read
Design and Evolution of a R&D Measurement Platform: Architecture, Data Governance, and Interactive Analytics
转转QA
转转QA
Nov 22, 2024 · R&D Management

Design and Evolution of Zhuanzhuan's R&D Measurement Platform

This article details the purpose, technical evolution, data governance, data modeling, warehouse layering, SQL‑based metric definition, and interactive design of Zhuanzhuan's measurement platform, illustrating how systematic data collection and visualization improve R&D efficiency, accuracy, and decision‑making.

BIEfficiency MeasurementPlatform Architecture
0 likes · 30 min read
Design and Evolution of Zhuanzhuan's R&D Measurement Platform
Architecture & Thinking
Architecture & Thinking
Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataCloud NativeDistributed Systems
0 likes · 22 min read
How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets
DataFunSummit
DataFunSummit
Nov 12, 2024 · Big Data

Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders

The article summarizes a roundtable discussion where experts compare four lake‑warehouse architectural patterns, explain their suitability for different business scenarios, contrast them with traditional data warehouses, and highlight practical considerations for choosing and evolving data platforms.

Big DataData LakeHudi
0 likes · 6 min read
Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders
37 Interactive Technology Team
37 Interactive Technology Team
Nov 4, 2024 · Artificial Intelligence

Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components

The article outlines a step‑by‑step methodology for creating Retrieval‑Augmented Generation and custom Agent applications with LangChain, illustrated by an AI assistant for activity components that evolves from a rapid Dify prototype to a LangChain‑based RAG system and finally a hand‑crafted ReAct‑style agent, detailing LCEL chain composition, vector‑search integration, model performance trade‑offs, and a unified routing layer.

AI AssistantCloud-nativeLLM
0 likes · 6 min read
Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components
Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGBulkloadCloud Native
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
360 Tech Engineering
360 Tech Engineering
Oct 17, 2024 · Databases

Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow

This article explains DataFusion, a Rust‑written, Arrow‑based query engine that offers high performance, extensibility, and seamless integration with various data sources, detailing its architecture, execution model, Rust advantages, and practical usage examples for building modern data‑warehouse solutions.

Apache ArrowDataFusionQuery Engine
0 likes · 15 min read
Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow
Baidu Tech Salon
Baidu Tech Salon
Oct 16, 2024 · Big Data

Design and Implementation of an Online/Offline Integrated Task Scheduling System for Baidu's Mobile Operations Promotion Platform (OPS)

The paper presents Baidu’s Mobile Operations Promotion Platform redesign, introducing an online‑offline integrated task‑scheduling architecture that partitions settlement fields to the data‑warehouse, records all jobs in a unified MySQL operation table, orchestrates them via Turing Data Studio, and manages dependencies to achieve consistent, auditable, billion‑scale settlement processing.

BaiduOffline ProcessingOps
0 likes · 14 min read
Design and Implementation of an Online/Offline Integrated Task Scheduling System for Baidu's Mobile Operations Promotion Platform (OPS)
DataFunSummit
DataFunSummit
Oct 11, 2024 · Big Data

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

Kuaishou’s data‑lake initiative tackled exploding offline warehouse costs, redundant model proliferation, and data‑consistency complexities by adopting Apache Hudi’s schema‑evolution capabilities and real‑time lake ingestion, improving cross‑team collaboration and narrowing the real‑time‑offline data gap.

Apache HudiBig DataData Lake
0 likes · 6 min read
Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi
DataFunTalk
DataFunTalk
Sep 28, 2024 · Big Data

Metric Management and Standardization in Didi's Data Platform

This article outlines Didi's approach to metric management, covering background, data product overview, and challenges in traditional and agile BI models, and presents a comprehensive solution for metric standardization, logical modeling, quality assurance, unified consumption, and future roadmap to improve data warehouse efficiency and consistency.

BIData GovernanceData Modeling
0 likes · 21 min read
Metric Management and Standardization in Didi's Data Platform
DataFunTalk
DataFunTalk
Sep 20, 2024 · Databases

Technical Paper Summaries on Graph Databases, Vector Databases, and Real-Time Data Warehousing

This article compiles concise English summaries of several technical papers covering Xiaohongshu's REDgraph graph database, DingoDB vector database, Tianqiong autonomous data platform, Douyin's real‑time data warehouse, financial‑grade data warehousing, Alibaba Cloud ClickHouse Serverless offering, best practices in financial data governance, and 58.com user‑profile data warehouse construction.

Big DataDatabasesdata warehouse
0 likes · 5 min read
Technical Paper Summaries on Graph Databases, Vector Databases, and Real-Time Data Warehousing
DataFunTalk
DataFunTalk
Sep 17, 2024 · Databases

Overview of Recent Advances in Graph, Vector, and Real-Time Data Warehouse Technologies

This article presents a collection of technical abstracts covering graph database parallel query optimization, next‑generation vector databases, real‑time data warehouse architectures, and cloud‑native analytics solutions, while also providing instructions for obtaining the full e‑book via a WeChat public account.

Big DataCloud Nativedata warehouse
0 likes · 5 min read
Overview of Recent Advances in Graph, Vector, and Real-Time Data Warehouse Technologies
DataFunSummit
DataFunSummit
Sep 16, 2024 · Databases

DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices

The DataFun Summit page presents a collection of technical papers covering graph database parallel queries, next‑generation vector databases, real‑time data warehouse architectures, and best practices in finance and e‑commerce, while also providing instructions for obtaining the e‑book via a public account.

Big DataDatabasesdata warehouse
0 likes · 5 min read
DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices
DevOps
DevOps
Sep 12, 2024 · Fundamentals

Advantages, Disadvantages, and Principles of Layered Architecture

This article examines the common benefits, drawbacks, and design principles of layered architecture across micro‑service, data‑warehouse, and protocol designs, illustrating each point with real‑world examples and offering practical guidance on when and how to apply layering effectively.

Layered ArchitectureMicroservicesdata warehouse
0 likes · 11 min read
Advantages, Disadvantages, and Principles of Layered Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Sep 11, 2024 · Fundamentals

Advantages, Disadvantages, and Principles of Layered Architecture in Software Systems

Layered architecture offers abstract stability, functional reuse, cohesion, hidden complexity, and scalability, but can introduce extra complexity, performance overhead, and dependency risk, so designers should retain essential layers, enforce one‑way cross‑layer calls, depend only on lower layers, keep lower layers stable, and ensure each layer has a clear purpose.

DDDLayered ArchitectureMicroservices
0 likes · 11 min read
Advantages, Disadvantages, and Principles of Layered Architecture in Software Systems