Tagged articles

Data Engineering

291 articles · Page 1 of 3

Jun 28, 2026 · Operations

Why Tencent Music Rejects AI Hype: Building an OpenClaw‑Powered Intelligent Ops Ecosystem

The article details Tencent Music's step‑by‑step evolution from manual alert handling to a three‑layer cloud‑native AIOps platform, describing data pipelines, dynamic 3‑sigma alerts, full‑link observability, and the OpenClaw sandbox with multi‑agent architecture that prioritises scenario‑driven, safe AI integration.

AIAIOpsCloud Native

0 likes · 17 min read

Why Tencent Music Rejects AI Hype: Building an OpenClaw‑Powered Intelligent Ops Ecosystem

DataFunSummit

Jun 14, 2026 · Artificial Intelligence

How cz-cli Empowers Data Engineers by Giving AI Real Understanding of Data Warehouses

The article analyzes how data engineers lose focus to repetitive tasks, describes the design journey from generic LLM usage to the specialized cz-cli agent, details its 37 skills and typical scenarios such as lineage analysis and incremental pipelines, and shows how the tool returns attention control to engineers while also enabling business users to self‑serve data.

AI AgentsAutomationData Engineering

0 likes · 13 min read

How cz-cli Empowers Data Engineers by Giving AI Real Understanding of Data Warehouses

Frontend AI Walk

Jun 14, 2026 · Industry Insights

Redefining the Career Track: The Forward Deployed Engineer Blueprint

The article defines the Forward Deployed Engineer (FDE) role as a bridge between software engineers and customers, outlines its core duties, compares it with Sales Engineer and Solutions Architect, presents market data, a detailed skill framework, a four‑pillar self‑assessment, and a step‑by‑step transition roadmap for aspiring engineers.

Data EngineeringForward Deployed EngineerFull‑stack development

0 likes · 19 min read

Redefining the Career Track: The Forward Deployed Engineer Blueprint

Machine Heart

Jun 2, 2026 · Artificial Intelligence

When AI Becomes Its Own Data Engineer: Inside DataMaster

DataMaster introduces an autonomous AI data engineer that automatically searches, cleans, combines, and reuses data, enabling fixed models and training pipelines to achieve substantial performance gains across benchmarks such as MLE‑Bench Lite and PostTrainBench, including a 31.0% GPQA score.

AI researchAutonomous AgentsData Engineering

0 likes · 11 min read

When AI Becomes Its Own Data Engineer: Inside DataMaster

Alibaba Cloud Big Data AI Platform

May 27, 2026 · Artificial Intelligence

DataWorks Data Agent Powers AI‑Driven Data Development: 2‑3× Faster and 80% Automation with SuperETL

The article details how DataWorks Data Agent integrates logistics industry standards and a skill‑based orchestration to overhaul the data development workflow, delivering 2‑3× efficiency gains and up to 80% AI‑automated task completion through SuperETL, hooks, and CLI tools.

AIAutomationData Engineering

0 likes · 10 min read

DataWorks Data Agent Powers AI‑Driven Data Development: 2‑3× Faster and 80% Automation with SuperETL

Wu Shixiong's Large Model Academy

May 13, 2026 · Artificial Intelligence

How to Explain a Jump from 71% to 94% Tool‑Calling Accuracy in a JD Interview

The article walks through a JD interview scenario where a candidate explains how a tool‑calling accuracy metric rose from 71% to 94% by detailing the full SFT data‑engineering pipeline, teacher‑model trajectory generation, quality validation, evaluation methodology, and interview‑ready talking points.

Data EngineeringEvaluationFunction Calling

0 likes · 19 min read

How to Explain a Jump from 71% to 94% Tool‑Calling Accuracy in a JD Interview

DataFunTalk

May 2, 2026 · Big Data

Building a One-Person Data Team: Core Skills of a Full‑Stack Data Engineer

The article examines why a single data engineer can run an end‑to‑end data team, outlines the essential abilities—semantic ownership, building an agentic data stack, and leveraging historical context—while discussing ChatBI’s limits, validation loops, and the open‑source Datus 0.3 harness for practical implementation.

Agentic AIChatBIData Engineering

0 likes · 14 min read

Building a One-Person Data Team: Core Skills of a Full‑Stack Data Engineer

Woodpecker Software Testing

Apr 29, 2026 · R&D Management

Test Data Generation Teams Must Evolve: From Data Movers to Data Engineering Experts

With CI/CD pipelines maturing, automated test coverage is no longer the bottleneck; the real constraint has shifted to producing accurate, fast, and secure test data, prompting teams to upgrade from simple data mocking to full‑stack data engineering, AI‑driven synthesis, and verifiable data contracts.

AICI/CDData Engineering

0 likes · 8 min read

Test Data Generation Teams Must Evolve: From Data Movers to Data Engineering Experts

Big Data Tech Team

Apr 9, 2026 · Industry Insights

Why Data Engineers Are the New AI Powerhouses: 4 Core Reasons & Actionable Tips

The article analyzes why data development engineers are becoming more valuable in the AI era, outlining four core reasons—including data‑driven AI limits, the rise of RAG architectures, heightened data compliance, and a talent shortage—while offering concrete advice on mastering real‑time pipelines, unstructured data, and AI infrastructure.

AI InfrastructureBig DataData Engineering

0 likes · 8 min read

Why Data Engineers Are the New AI Powerhouses: 4 Core Reasons & Actionable Tips

Big Data Technology & Architecture

Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftData EngineeringLance

0 likes · 21 min read

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

Big Data Tech Team

Apr 1, 2026 · Big Data

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

In the 2026 spring hiring season, many big‑data job seekers see their resumes disappear because they still focus on offline batch processing, while employers now demand real‑time streaming, AI‑driven data pipelines, and cloud‑native deployment skills such as Flink, vector databases, and Kubernetes.

AI integrationBig DataCloud Native

0 likes · 7 min read

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

dbaplus Community

Mar 22, 2026 · Industry Insights

Will Data Engineers Vanish by 2030? A Bold Forecast for the Future of Data Stacks

The article predicts that by 2030 the traditional data‑engineer role and modern data‑stack components will collapse into a few unified, HTAP‑capable databases, semantic layers, and AI agents, reshaping pipelines, warehouses, and even edge computing while urging engineers to pivot toward semantic modeling and AI orchestration.

AIData EngineeringDatabases

0 likes · 19 min read

Will Data Engineers Vanish by 2030? A Bold Forecast for the Future of Data Stacks

Alibaba Cloud Developer

Mar 6, 2026 · Big Data

How DataWorks Turns Data Quality Rules into Code with Data Contracts

This article explains how DataWorks integrates data quality specifications directly into the SQL development workflow using Data Contracts, addressing governance lag, versioning gaps, and trust issues while providing a unified, version‑controlled, and automated quality assurance process for offline data pipelines.

Data EngineeringData QualityDataWorks

0 likes · 12 min read

How DataWorks Turns Data Quality Rules into Code with Data Contracts

Big Data Technology & Architecture

Mar 6, 2026 · Big Data

What’s New in Big Data Frameworks? ClickHouse, Fluss, Delta Lake, StarRocks & More (Mar 2026)

This roundup compiles the latest releases across major data platforms—including ClickHouse, Apache Fluss, Delta Lake, StarRocks, Apache Pulsar and DolphinScheduler—highlighting version numbers, key feature additions, security fixes, and emerging trends shaping the big‑data ecosystem.

Apache FlussBig DataClickHouse

0 likes · 19 min read

What’s New in Big Data Frameworks? ClickHouse, Fluss, Delta Lake, StarRocks & More (Mar 2026)

SuanNi

Feb 23, 2026 · Artificial Intelligence

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

FireRed-Image-Edit, an open‑source instruction‑driven diffusion model, combines massive high‑quality data, a dual‑stream multimodal architecture, progressive training, and a comprehensive multi‑dimensional benchmark to achieve unprecedented pixel‑level control and human‑like editing performance across diverse visual tasks.

AIData EngineeringDiffusion Models

0 likes · 12 min read

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

DataFunSummit

Feb 1, 2026 · Artificial Intelligence

How AI Agents Are Redefining Data Engineering: Expert Insights and Real‑World Practices

In a deep‑dive roundtable, three data‑engineering veterans discuss the rise of AI agents, the importance of data context, memory mechanisms, workflow versus agent trade‑offs, and the future of database intelligence, offering practical strategies and architectural philosophies for building smarter data pipelines.

Data EngineeringDatabase IntelligenceImmersive Analytics

0 likes · 24 min read

How AI Agents Are Redefining Data Engineering: Expert Insights and Real‑World Practices

Fun with Large Models

Jan 12, 2026 · Artificial Intelligence

Why You Should Master Large‑Model Training: A Full‑Process Practical Guide

The article explains why mastering large‑model training is crucial for professionals, researchers, and enterprises, outlines the end‑to‑end pipeline—from data preparation and pre‑training to instruction fine‑tuning and RLHF alignment—compares training with RAG, and presents a structured learning roadmap.

AI AgentsData EngineeringPyTorch

0 likes · 14 min read

Why You Should Master Large‑Model Training: A Full‑Process Practical Guide

Big Data Tech Team

Dec 29, 2025 · Big Data

Master Big Data Development: A Complete Roadmap from Beginner to Expert

This guide presents a comprehensive big‑data development roadmap, detailing industry opportunities, a six‑module technology stack, four progressive learning stages, hands‑on project ideas, interview question strategies, common pitfalls, and curated resources, helping aspiring engineers become proficient and interview‑ready while avoiding common mistakes.

Big DataData EngineeringRoadmap

0 likes · 11 min read

Master Big Data Development: A Complete Roadmap from Beginner to Expert

Big Data Tech Team

Dec 26, 2025 · Interview Experience

How to Nail a 2‑Minute Data Engineer Self‑Introduction

This guide outlines a concise, 1.5‑2‑minute self‑introduction for data engineering interviews, highlighting essential personal details, technical stack, project achievements, business impact, and common pitfalls to avoid, with a concrete example and actionable tips.

Big DataCareer AdviceData Engineering

0 likes · 5 min read

How to Nail a 2‑Minute Data Engineer Self‑Introduction

Alibaba Cloud Developer

Dec 16, 2025 · Artificial Intelligence

How We Built an AI‑Powered Data Agent to Automate Data Retrieval at Scale

This article details the design and implementation of Matra, an AI‑driven data assistant for a large e‑commerce platform, covering the challenges of legacy data assets, knowledge‑base construction, GraphRAG integration, multi‑stage agent frameworks, practical results, and future plans for continuous improvement.

AIData EngineeringData Retrieval

0 likes · 22 min read

How We Built an AI‑Powered Data Agent to Automate Data Retrieval at Scale

StarRocks

Dec 11, 2025 · Databases

How StarRocks Redesigns Bulk Import to Cut Small Files and Boost Throughput

This article explains how StarRocks mitigates the hidden risks of massive one‑time data imports in a storage‑compute separated architecture by redesigning the write path to spill to local disk, merge centrally, and write to object storage, resulting in fewer small files, higher write throughput, and more stable query performance.

Bulk ImportCompactionData Engineering

0 likes · 12 min read

How StarRocks Redesigns Bulk Import to Cut Small Files and Boost Throughput

Wu Shixiong's Large Model Academy

Dec 10, 2025 · Artificial Intelligence

Why RLHF Success Relies on Data Engineering, Not Just Model Tricks

The article explains that the real difficulty of RLHF lies in designing and curating high‑quality preference data, building robust reward models through bad‑case rewriting, human‑in‑the‑loop labeling, and inference‑based reward modeling, while algorithmic details like PPO are secondary concerns.

Data EngineeringGRPORLHF

0 likes · 9 min read

Why RLHF Success Relies on Data Engineering, Not Just Model Tricks

DataFunSummit

Nov 27, 2025 · Big Data

How BMW Turned Data Into Growth: A Sensors Data Case Study

This article details BMW's digital transformation journey using Sensors Data, covering the background of rapid app growth, the cross‑regional data collection challenges, the systematic solution architecture—including mapping, preprocessing, and historical data migration—and the resulting business impact and future AI‑driven roadmap.

AnalyticsBig DataData Engineering

0 likes · 13 min read

How BMW Turned Data Into Growth: A Sensors Data Case Study

Data STUDIO

Nov 25, 2025 · Big Data

Why Parquet Is the Faster, Lighter, Safer Alternative to CSV in Python

The article explains why CSV becomes a bottleneck for large‑scale data, demonstrates how Parquet’s columnar, typed, and compressed format dramatically reduces storage, speeds up reads, and improves data safety, and provides step‑by‑step Python code for migrating and benchmarking the switch.

CSVData EngineeringDuckDB

0 likes · 18 min read

Why Parquet Is the Faster, Lighter, Safer Alternative to CSV in Python

PMTalk Product Manager Community

Nov 23, 2025 · Artificial Intelligence

Essential Strategies for Building Successful AI Products

This guide outlines a step‑by‑step framework for creating AI products, covering problem discovery, user‑centric motivation analysis, compliance and ethics, defining a Minimum Viable Intelligent Product, assembling multidisciplinary teams, leveraging data and model selection, designing trustworthy UX, go‑to‑market tactics, moat building, and continuous monitoring for improvement.

AIData EngineeringGrowth

0 likes · 17 min read

Essential Strategies for Building Successful AI Products

Ctrip Technology

Nov 20, 2025 · Big Data

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Ctrip transformed its traditional T+1 offline warehouse into a near‑real‑time lakehouse by integrating Flink CDC with Apache Paimon, designing a two‑stage CDC ingestion, optimizing performance, implementing dynamic updates, and deploying the solution across multiple business scenarios, achieving minute‑level latency, reduced costs, and faster data‑driven decisions.

CDCData EngineeringFlink

0 likes · 27 min read

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Past Memory Big Data

Nov 12, 2025 · Big Data

How Uber Upgraded Over 2 Million Spark Jobs from 2.4 to 3.3

Uber migrated more than two million daily Spark applications from version 2.4 to 3.3, detailing the motivations, architecture, four-step migration process, custom tools like Polyglot Piranha and Iron Dome, and the resulting performance, cost, and productivity gains.

Apache SparkData EngineeringIron Dome

0 likes · 11 min read

How Uber Upgraded Over 2 Million Spark Jobs from 2.4 to 3.3

JD Cloud Developers

Nov 10, 2025 · Artificial Intelligence

How an AI‑Powered Experiment Analysis Agent Transforms Data Insights

This document outlines the background, design, architecture, workflow, and large‑model integration of an AI‑driven Experiment Analysis Agent, detailing how it consolidates data, automates analysis via modular pipelines, leverages DeepSeek models, and enhances user experience through unified front‑end forms and intelligent messaging.

Data EngineeringWorkflow Automation

0 likes · 15 min read

How an AI‑Powered Experiment Analysis Agent Transforms Data Insights

Alimama Tech

Oct 15, 2025 · Artificial Intelligence

How Alibaba’s Taobao Starry Model Delivers Precise, Consistent E‑commerce Image Edits

Alibaba’s Taobao Starry Image Editing model tackles the e‑commerce challenge of maintaining visual consistency by introducing a high‑fidelity, plug‑in architecture, a million‑scale consistency dataset, and multi‑stage multilingual training, enabling precise, controllable edits without altering product layout or background.

Data Engineeringconsistencye‑commerce AI

0 likes · 10 min read

How Alibaba’s Taobao Starry Model Delivers Precise, Consistent E‑commerce Image Edits

AI2ML AI to Machine Learning

Sep 28, 2025 · Artificial Intelligence

Core Metrics for Enterprise Large‑Model Engineering

The article outlines the five essential engineering domains—application, model, compute, knowledge, and data—in the era of large models, and details concrete scale, efficiency, service, value, quality, and security metrics that enterprises should track to drive intelligent outcomes.

AI EngineeringData EngineeringKnowledge Management

0 likes · 7 min read

Core Metrics for Enterprise Large‑Model Engineering

Huolala Tech

Sep 19, 2025 · Big Data

How We Migrated 40PB of Offline Big Data Across Clouds with Zero Downtime

Over a year after completing a five‑month, cross‑cloud migration of Huolala’s 40 PB offline big‑data platform—spanning storage, compute, services, and infrastructure—the team details the architecture, verification methods, high‑throughput migration tools, network isolation strategies, and lessons learned to guide similar large‑scale data migrations.

AutomationCloud MigrationData Engineering

0 likes · 16 min read

How We Migrated 40PB of Offline Big Data Across Clouds with Zero Downtime

DataFunTalk

Sep 15, 2025 · Artificial Intelligence

How AI+Data Agents Are Transforming the Automotive Industry’s Digital Leap

In an interview, Di Xingxing of Autohome details their AI+Data framework—unified lake‑warehouse, intelligent engine, and agent services—that breaks data silos, blends traditional models with LLMs, leverages causal inference and RAG knowledge bases, and uses continuous feedback to build explainable, evolving data agents for accurate sales forecasting, competitive analysis, and end‑to‑end business automation in the automotive industry.

AIAutomotiveData Engineering

0 likes · 10 min read

How AI+Data Agents Are Transforming the Automotive Industry’s Digital Leap

Data Party THU

Sep 6, 2025 · Big Data

From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition

An individual participant recounts their journey in the 2025 China University Computer Competition Big Data Challenge, detailing data cleaning, feature engineering, model building on 300‑stock historical prices, and insights gained from solo competition experience, highlighting challenges, lessons, and future directions in financial AI.

Big DataData Engineeringcompetition

0 likes · 4 min read

From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition

Huolala Tech

Aug 14, 2025 · Artificial Intelligence

How LLMs Are Revolutionizing Natural Language to SQL for Intelligent Data Queries

This article explores how large language models break the natural‑language‑to‑SQL barrier, outlines the challenges of NLP‑driven data retrieval, compares Text2SQL and Text2DSL approaches, and proposes a unified data service and metric platform to power enterprise‑grade ChatBI solutions.

AIChatBIData Engineering

0 likes · 22 min read

How LLMs Are Revolutionizing Natural Language to SQL for Intelligent Data Queries

JD Retail Technology

Aug 8, 2025 · Big Data

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

This article examines JD.com's migration of its massive traffic data processing from a dual Lambda architecture to an integrated lakehouse solution, detailing the challenges, innovative optimizations with Flink and Hudi, performance gains, cost reductions, and future directions for real‑time data handling.

Big DataData EngineeringFlink

0 likes · 10 min read

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

DataFunSummit

Jul 20, 2025 · Big Data

Why Incremental Computing Is Replacing Lambda Architecture in Modern Big Data Platforms

This interview with Yunqi Technology CTO Guan Tao explains how the traditional Lambda architecture’s triple‑system complexity drives costs and operational pain, and why the company’s General Incremental Computing (GIC) approach offers a unified, cost‑effective Kappa‑style solution for real‑time, batch, and interactive analytics.

Data EngineeringKappa architectureLambda architecture

0 likes · 13 min read

Why Incremental Computing Is Replacing Lambda Architecture in Modern Big Data Platforms

DataFunTalk

Jul 18, 2025 · Artificial Intelligence

How Alibaba Tackles Low-Resource Language Data for Multilingual LLMs

Alibaba International’s senior data science expert explains a systematic five‑strategy solution—data acquisition, augmentation, quality optimization, engineering pipeline, and evaluation loop—to overcome data scarcity, high annotation cost, and processing challenges for low‑resource languages in multilingual large language models.

AIData Engineeringlow-resource languages

0 likes · 13 min read

How Alibaba Tackles Low-Resource Language Data for Multilingual LLMs

DataFunSummit

Jul 11, 2025 · Artificial Intelligence

How DataOps + Large Language Models Are Transforming Text2SQL and Data Engineering

This article examines how Hainan Shuzhao Technology leverages ChatGPT‑4 and other large language models to enhance DataOps, address traditional data management challenges, improve Text2SQL accuracy, and outline future directions for agile, AI‑driven data pipelines.

AIData EngineeringDataOps

0 likes · 23 min read

How DataOps + Large Language Models Are Transforming Text2SQL and Data Engineering

Big Data Technology & Architecture

Jul 4, 2025 · Big Data

Spark 4.0: New Features, Performance Gains, and Why It Still Leads Big Data

Despite the hype around Flink and AI models, Spark 4.0’s release brings a lightweight Python client, Spark Connect GA, enhanced SQL optimization, vectorized execution, and AI integration, reaffirming its leading position in the big‑data ecosystem while hinting at future challenges and innovations.

Big DataData EngineeringPerformance Optimization

0 likes · 6 min read

Spark 4.0: New Features, Performance Gains, and Why It Still Leads Big Data

Big Data Technology & Architecture

Jun 20, 2025 · Big Data

2025 Mid-Year Data Job Market Trends: Skills, Hiring Preferences, and Career Advice

This semi‑annual report analyzes over 120 interview experiences to reveal current hiring trends, required skills, and strategic advice for data professionals targeting mid‑to‑senior positions in large‑scale companies during the first half of 2025.

2025 trendsCareer AdviceData Engineering

0 likes · 9 min read

2025 Mid-Year Data Job Market Trends: Skills, Hiring Preferences, and Career Advice

JD Retail Technology

Jun 18, 2025 · Artificial Intelligence

How JD’s Tech Teams Power 618: AI, Logistics, and Voice Innovations

The article explores how JD’s engineers across retail, logistics, and AI divisions use model distillation, data selection, intelligent routing, and advanced voice recognition to improve the 618 shopping festival experience, highlighting real‑world technical challenges, solutions, and the company’s talent development programs.

AIData Engineeringlogistics

0 likes · 16 min read

How JD’s Tech Teams Power 618: AI, Logistics, and Voice Innovations

Alibaba Cloud Infrastructure

May 26, 2025 · Big Data

Comparative Guide to Apache Airflow and Argo Workflows for Distributed Task Scheduling

This article provides a comprehensive comparison of Apache Airflow and Argo Workflows, covering their core features, architectures, use cases, code examples, and recommendations for selecting the appropriate distributed workflow engine in data engineering, big‑data, and AI pipelines.

Apache AirflowArgo WorkflowsBig Data

0 likes · 23 min read

Comparative Guide to Apache Airflow and Argo Workflows for Distributed Task Scheduling

Full-Stack Internet Architecture

May 20, 2025 · Big Data

Why Learn Kafka? Core Benefits, Use Cases, and a Summary

This article explains why Kafka is widely adopted by top companies, outlines its high throughput, scalability, and durability, and describes key real‑time data pipeline, stream processing, and big‑data integration scenarios, concluding that mastering Kafka is essential for modern backend and data engineering roles.

Data EngineeringReal-time Processingkafka

0 likes · 4 min read

Why Learn Kafka? Core Benefits, Use Cases, and a Summary

Alibaba Cloud Native

May 18, 2025 · Cloud Native

Airflow vs Argo Workflows: Which Cloud‑Native Scheduler Wins for Data Engineering?

This comprehensive guide compares Apache Airflow and Argo Workflows—two leading cloud‑native distributed task schedulers—by examining their core features, architectures, DAG handling, performance, language support, big‑data and AI integrations, and provides practical selection advice for data engineers and DevOps teams.

AirflowArgo WorkflowsData Engineering

0 likes · 23 min read

Airflow vs Argo Workflows: Which Cloud‑Native Scheduler Wins for Data Engineering?

Fighter's World

May 17, 2025 · Industry Insights

Hidden Roadblocks That Sabotage B2B Large Model Products

The article dissects why many B2B GenAI projects fail to scale despite heavy investment, highlighting overlooked challenges in data preparation, model specialization, product integration, user experience, and organizational culture, and proposes concrete ways to bridge these gaps.

B2BData EngineeringGenAI

0 likes · 21 min read

Hidden Roadblocks That Sabotage B2B Large Model Products

Big Data Tech Team

Apr 26, 2025 · Big Data

Mastering the Data Development Roadmap: From Infrastructure to AI Integration

This guide outlines a comprehensive data development roadmap, covering infrastructure setup, governance frameworks, automated pipelines, BI and analytics tools, AI/ML integration, cultural adoption, and continuous performance monitoring to enable intelligent business transformation.

AI integrationAnalyticsBig Data

0 likes · 5 min read

Mastering the Data Development Roadmap: From Infrastructure to AI Integration

DevOps Engineer

Apr 25, 2025 · Big Data

Reflections on PyCon LT 2025 Data Day: Sessions on Static Code Analysis, Data Warehouses, Pipelines, and Data Science Tools

The author recounts attending PyCon LT 2025 Data Day, summarizing talks on building a simple static code analyzer with AST, challenges of data warehouses versus data lakes, cloud cost‑scraping pipelines, A/B testing libraries, privacy‑enhancing data processing, and tools like Panel and Dagster, while noting the inspiring presence of female speakers.

DagsterData EngineeringPanel

0 likes · 7 min read

Reflections on PyCon LT 2025 Data Day: Sessions on Static Code Analysis, Data Warehouses, Pipelines, and Data Science Tools

Big Data Tech Team

Apr 20, 2025 · Industry Insights

Essential Skills & Tech Stacks for Every Data Team Role

This guide breaks down the main positions in a data team— from data development and analysis engineers to product managers and operations specialists—detailing each role’s key responsibilities, essential skill sets, and the typical technology stack they rely on.

Big DataData Engineeringdata analytics

0 likes · 7 min read

Essential Skills & Tech Stacks for Every Data Team Role

Alibaba Cloud Big Data AI Platform

Apr 15, 2025 · Big Data

Boosting Game Data Engineering with Alibaba Cloud EMR Serverless Spark

Yingjiao Network transformed its game data platform by adopting Alibaba Cloud EMR Serverless Spark, addressing previous architecture pain points, enhancing data collection, offline scheduling, and online analytics, which led to higher development speed, 50% faster compute, and improved stability for global game operations.

Cloud ComputingData Engineeringgaming analytics

0 likes · 9 min read

Boosting Game Data Engineering with Alibaba Cloud EMR Serverless Spark

Kuaishou Tech

Apr 2, 2025 · Big Data

Apache Hudi Asia Summit Successfully Held

The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.

Apache HudiBig DataData Engineering

0 likes · 12 min read

Apache Hudi Asia Summit Successfully Held

Baidu Geek Talk

Mar 24, 2025 · Big Data

How Turing Data Finder Transforms Growth Analysis with a Unified Data Platform

The article provides a detailed technical overview of the Turing Data Finder (TDF) platform, describing its background, core components, data schema, ingestion workflow, and a suite of growth‑analysis features such as event, retention, funnel, path, component, distribution, and attribution analysis, while also outlining performance‑optimisation techniques and future development directions.

Big DataData EngineeringData Platform

0 likes · 17 min read

How Turing Data Finder Transforms Growth Analysis with a Unified Data Platform

Big Data Technology & Architecture

Mar 3, 2025 · Big Data

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

The article analyzes how the rapid rise of open‑source large‑model AI in 2025 is reshaping the data development profession, urging developers to transition from specialized data‑engineer roles to full‑stack AI data engineering skills such as distributed computing, lake‑house architectures, and model tuning.

AIBig DataData Engineering

0 likes · 7 min read

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

ITPUB

Feb 11, 2025 · Operations

Why Your Monitoring Fails and How to Build Effective Observability Data

Many companies deploy fragmented monitoring and observability tools yet still struggle to pinpoint incidents; this article analyzes the root causes—under‑utilized tools and scenario‑agnostic data—and offers practical steps to organize metrics, build layered insights, and improve fault‑resolution efficiency.

Data EngineeringMonitoringObservability

0 likes · 12 min read

Why Your Monitoring Fails and How to Build Effective Observability Data

Big Data Technology Architecture

Feb 8, 2025 · Big Data

How AI Can Accelerate Data Engineering: Practical DeepSeek Use Cases and Tips

This article shows how AI tools like DeepSeek can dramatically speed up data‑engineering tasks—such as fixing long‑running SQL queries, building real‑time data pipelines with Flink, and deciphering legacy stored procedures—while offering concrete prompts, real‑world case studies, and five time‑saving techniques.

AutomationData EngineeringDeepSeek

0 likes · 6 min read

How AI Can Accelerate Data Engineering: Practical DeepSeek Use Cases and Tips

DataFunSummit

Feb 5, 2025 · Artificial Intelligence

Exploration and Practice of Large‑Model Data Construction

This presentation details engineering‑focused approaches to building, mixing, and filtering data for large language models, covering data preparation, pre‑training mix strategies such as DoReMi, DoGE and online sampling, post‑training data quality selection methods, and practical Q&A on scaling laws and PDF processing.

AIData EngineeringData Mixing

0 likes · 15 min read

Exploration and Practice of Large‑Model Data Construction

21CTO

Feb 4, 2025 · Big Data

Why Python Beats Java and Scala for Modern Data Engineering

The article compares Java, Scala, SQL, and Python for data‑engineering tasks, arguing that Python’s versatility, rich ecosystem, and ease of use make it the preferred language for both small‑scale and massive Spark workloads despite its performance trade‑offs.

Big DataData EngineeringSQL

0 likes · 7 min read

Why Python Beats Java and Scala for Modern Data Engineering

Big Data Technology & Architecture

Jan 15, 2025 · Big Data

From Operations to Data Engineering: A Student’s Real‑World Journey and Practical Guide

This article shares a data‑engineering student’s personal experience—from a misaligned operations role to mastering big‑data technologies, building a portfolio, crafting a targeted resume, and navigating multi‑stage interviews—offering concrete advice and a structured learning roadmap for aspiring data professionals.

Big DataData Engineeringinterview preparation

0 likes · 14 min read

From Operations to Data Engineering: A Student’s Real‑World Journey and Practical Guide

Alibaba Cloud Big Data AI Platform

Jan 6, 2025 · Cloud Native

How Fluid Enables Seamless Dynamic Dataset Mounting for Cloud‑Native AI Development

PAI‑DSW leverages the Fluid project to provide a cloud‑native AI development platform where data scientists can dynamically mount and unmount OSS datasets on running Kubernetes pods without restarting, improving workflow efficiency and addressing the challenges of heterogeneous data source management in AI engineering.

Cloud NativeData EngineeringFluid

0 likes · 18 min read

How Fluid Enables Seamless Dynamic Dataset Mounting for Cloud‑Native AI Development

JD Tech

Dec 30, 2024 · Big Data

Techniques for Writing Elegant and Efficient SQL in Big Data Environments

The article shares practical methods and code examples for making SQL both readable and high‑performing in large‑scale data platforms, covering predicate push‑down with subqueries, deduplication strategies, bucket utilization, and Python‑driven job parameter handling.

Data EngineeringHiveSQL

0 likes · 14 min read

Techniques for Writing Elegant and Efficient SQL in Big Data Environments

dbaplus Community

Dec 24, 2024 · Big Data

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

The article details Bilibili's comprehensive redesign of its tag system—including background challenges, architectural layers, technical upgrades like Iceberg integration and shard‑based ClickHouse writes, crowd selection methods, online service guarantees, performance metrics, and future plans—showcasing a data‑driven solution that boosts stability, speed, and business coverage.

ClickHouseData EngineeringDistributed Computing

0 likes · 24 min read

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

Python Programming Learning Circle

Dec 6, 2024 · Artificial Intelligence

24 Essential Python Libraries for an End‑to‑End Data Science Workflow

This article introduces 24 highly useful Python libraries that cover the entire data‑science lifecycle—from data collection and cleaning to visualization, modeling, interpretation, and deployment—helping readers build a comprehensive and visually appealing data‑analysis pipeline.

Data Engineeringdata sciencelibraries

0 likes · 3 min read

24 Essential Python Libraries for an End‑to‑End Data Science Workflow

Xiaohongshu Tech REDtech

Dec 5, 2024 · Big Data

Interview with Jianchen: Journey from Open Source Contributor to Data Engineer at Xiaohongshu

In this interview, Xiaohongshu data engineer Jianchen recounts his evolution from a computer‑science student discovering open‑source through MIT6.824 to contributing to SOFAJRaft and Apache RocketMQ, detailing his OSPP projects, the decision to join Xiaohongshu, and his work on a cloud‑native Kafka engine that cut storage and compute usage by half.

Apache RocketMQBig DataCloud Native

0 likes · 11 min read

Interview with Jianchen: Journey from Open Source Contributor to Data Engineer at Xiaohongshu

DataFunSummit

Dec 5, 2024 · Big Data

Ping An Financial Services' Big Data Platform Construction and Data Governance Practices

This article details Ping An Financial Services' journey in building a comprehensive big‑data platform, addressing fragmentation, low data timeliness, processing limits, and governance challenges through a four‑stage technical evolution, modular tool development, and a systematic data‑governance framework to support its digital transformation.

Data EngineeringData Governancefinancial services

0 likes · 16 min read

Ping An Financial Services' Big Data Platform Construction and Data Governance Practices

ByteDance Data Platform

Nov 6, 2024 · Big Data

How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges

This article explains how Douyin Group tackles massive data volume, quality, and efficiency issues by building a four‑layer intelligent platform, standardizing metric management, automating metric decomposition, and creating reusable metric services that boost agility, stability, and cross‑team collaboration.

Big DataData EngineeringData Platform

0 likes · 20 min read

How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges

Bilibili Tech

Oct 25, 2024 · Big Data

DataFunSummit2024: Next-Generation Data Architecture Technology Summit

DataFunSummit2024, co-hosted by Bilibili, convenes industry experts, scholars, and enterprise leaders across six forums to discuss next‑generation data architecture, showcasing Bilibili’s Iceberg‑based stream‑batch innovations, AI‑BI analytics, NoETL practices, and emerging alternatives to Lambda architecture.

AI+BIBig DataData Architecture

0 likes · 3 min read

DataFunSummit

Oct 11, 2024 · Big Data

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

Kuaishou’s data‑lake initiative tackled exploding offline warehouse costs, redundant model proliferation, and data‑consistency complexities by adopting Apache Hudi’s schema‑evolution capabilities and real‑time lake ingestion, improving cross‑team collaboration and narrowing the real‑time‑offline data gap.

Apache HudiData Engineering

0 likes · 6 min read

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

Baobao Algorithm Notes

Oct 7, 2024 · Artificial Intelligence

Mastering LLM Supervised Fine‑Tuning: Practical Tips, Data Strategies, and Debugging

This article provides a comprehensive, experience‑driven guide to supervised fine‑tuning (SFT) of large language models, covering special tokens, latency considerations, data diversity and production, training frameworks and hyper‑parameters, over‑/under‑fitting diagnostics, and evaluation metrics such as helpfulness, honesty, and harmlessness.

AIData EngineeringLLM

0 likes · 40 min read

Mastering LLM Supervised Fine‑Tuning: Practical Tips, Data Strategies, and Debugging

AntData

Sep 26, 2024 · Artificial Intelligence

DB-GPT: Open-Source AI-Native Data Application Development Framework

DB‑GPT is an open‑source AI‑native data‑application framework that provides multi‑model management, Text‑to‑SQL optimization, RAG, multi‑agent collaboration, and intelligent workflow orchestration, enabling developers to build scalable large‑model database applications, with proven enterprise adoption, community growth, and academic publications.

AIData EngineeringRAG

0 likes · 6 min read

DB-GPT: Open-Source AI-Native Data Application Development Framework

JD Retail Technology

Sep 25, 2024 · Big Data

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

The article narrates the author’s 30‑year programming career and shares technical reflections on building business‑agnostic, configurable data platforms, covering batch, streaming, interactive computing, big‑data sharding, Spark, Flink, cloud migration, and the philosophy of software architecture.

Batch ProcessingCloud ComputingData Engineering

0 likes · 23 min read

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

AntTech

Sep 10, 2024 · Big Data

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

The talk reviews the rapid evolution of data technologies—from early database foundations and big‑data breakthroughs to the rise of generative AI—highlighting how Ant Group’s data platform is shifting from a cost‑efficiency focus to a value‑centric, multimodal, AI‑driven ecosystem.

Big DataData EngineeringData Platforms

0 likes · 17 min read

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

AntData

Sep 9, 2024 · Big Data

From Cost‑Efficiency to Value‑Centric: The Evolution of Data Systems in the Data+AI Era

The article reviews the rapid advances in generative AI and big‑data technologies, traces the historical development of data infrastructure, and argues that modern data systems are shifting from a cost‑efficiency focus to a value‑centric paradigm driven by multimodal, non‑structured data, vector search and machine‑oriented services.

@DataBig DataData Engineering

0 likes · 18 min read

From Cost‑Efficiency to Value‑Centric: The Evolution of Data Systems in the Data+AI Era

Baidu Intelligent Cloud Tech Hub

Sep 5, 2024 · Databases

How Vector Databases Power AI and RAG: Insights from Baidu’s DTCC 2024

This article reviews the 70‑year evolution of databases, explains how vector databases and Retrieval‑Augmented Generation (RAG) are reshaping AI applications, and details Baidu Intelligent Cloud's VectorDB architecture, performance advantages, real‑world use cases, and future trends in data engineering.

AIData EngineeringDatabase Architecture

0 likes · 16 min read

How Vector Databases Power AI and RAG: Insights from Baidu’s DTCC 2024

StarRocks

Sep 5, 2024 · Big Data

Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg

This tutorial walks you through the fundamentals of Apache Iceberg, its architecture and key features, explains why it’s advantageous for lakehouse workloads, and provides a step‑by‑step Docker‑Compose setup to integrate Iceberg with StarRocks for fast, ACID‑compliant analytics on real‑world taxi data.

Apache IcebergData EngineeringDocker

0 likes · 15 min read

Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg

Mike Chen's Internet Architecture

Aug 16, 2024 · Big Data

Understanding the Lambda Architecture for Big Data Processing

This article explains the Lambda architecture—a three‑layer model combining batch and real‑time processing for large‑scale data, outlines its components, advantages, disadvantages, common tools, and compares it with the Kappa alternative while providing practical insights for data engineers.

Batch ProcessingBig DataData Engineering

0 likes · 5 min read

Understanding the Lambda Architecture for Big Data Processing

StarRocks

Aug 14, 2024 · Big Data

Mastering StarRocks & Apache Paimon: A Fast‑Track Lakehouse Guide

This guide provides a comprehensive overview of Apache Paimon’s architecture, key features, and advantages, explains how to integrate it with StarRocks for real‑time lakehouse analytics, and walks through a complete quick‑start setup including component installation, Flink and Kafka deployment, data ingestion, table creation, and query execution with time‑travel support.

Apache PaimonData EngineeringFlink

0 likes · 18 min read

Mastering StarRocks & Apache Paimon: A Fast‑Track Lakehouse Guide

DataFunTalk

Aug 6, 2024 · Fundamentals

Solving Massive Data Retrieval Demands: From Problem Causes to OLAP Multidimensional Reporting Solutions

This article analyzes why data engineers face endless data‑extraction requests, identifies common missteps in data‑construction practices, and proposes a comprehensive solution based on dimensional modeling, OLAP multidimensional reporting, self‑service tools, and knowledge empowerment to dramatically improve efficiency and scalability.

Data EngineeringOLAPdimensional modeling

0 likes · 12 min read

Solving Massive Data Retrieval Demands: From Problem Causes to OLAP Multidimensional Reporting Solutions

Alibaba Cloud Observability

Jul 31, 2024 · Cloud Native

How the New SLS Data Processing Boosts Performance, Cuts Cost, and Simplifies Debugging with SPL

This article explains how Alibaba Cloud's SLS data processing resolves the tension between simple log collection and the need for structured, analyzable data by introducing a unified SPL syntax, delivering over tenfold performance gains, reducing costs to one‑third, and providing powerful debugging tools for cloud‑native log analytics.

Data EngineeringLog ProcessingSPL

0 likes · 8 min read

How the New SLS Data Processing Boosts Performance, Cuts Cost, and Simplifies Debugging with SPL

DataFunSummit

Jul 5, 2024 · Artificial Intelligence

Building and Applying a User Profile Tagging System: Practices and Insights

This article presents a comprehensive overview of constructing and deploying a user and item profiling tag system at Qunar, covering tag taxonomy, integration challenges, technical architectures, algorithmic methods such as classification, recommendation, knowledge‑graph and causal inference, as well as real‑time streaming, ID‑mapping, and practical applications in marketing, attribution and A/B testing.

AB testingData EngineeringTagging System

0 likes · 21 min read

DevOps

Jun 27, 2024 · Big Data

Agile Data Engineering: Code‑as‑Infrastructure, Reuse Strategies, and ETL‑Level Continuous Integration

This article explores agile data engineering, advocating code‑as‑infrastructure practices such as code‑everything, data and code reuse, and ETL‑level continuous integration, while discussing the trade‑offs between data‑centric and code‑centric reuse, cost considerations, and practical implementation tips for modern data projects.

Big DataCode as InfrastructureData Engineering

0 likes · 22 min read

Agile Data Engineering: Code‑as‑Infrastructure, Reuse Strategies, and ETL‑Level Continuous Integration

Baobao Algorithm Notes

Jun 27, 2024 · Artificial Intelligence

Engineering Data for R&D Large Language Models: From Pre‑training to Prompt Design

This article presents a comprehensive guide to data engineering for research‑focused large language models, covering domain‑adaptive pre‑training, supervised fine‑tuning, retrieval‑augmented generation, dataset construction, data cleaning pipelines, token‑izer adaptation, and prompt engineering best practices to boost model performance in specialized tasks.

Data EngineeringDomain AdaptationFine‑Tuning

0 likes · 20 min read

Engineering Data for R&D Large Language Models: From Pre‑training to Prompt Design

DataFunSummit

Jun 15, 2024 · Artificial Intelligence

Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights

This article reviews the rising importance of data quality for large models, explores data‑centric AI, large‑model pre‑training data engineering, and presents recent Fudan University research on using large models to improve data governance across multiple domains such as attribute normalization, geographic cleaning, compliance checking, and multimodal retrieval.

AIData EngineeringData Governance

0 likes · 19 min read

Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights

Data Thinking Notes

May 30, 2024 · Databases

Why Your Data Team Is Drowning in Requests—and How OLAP Can Save You

This article examines why data departments get overwhelmed by massive data‑retrieval requests, identifies root causes such as mindset, requirement handling, and lack of tools, and presents a technical solution centered on dimensional modeling and OLAP multi‑dimensional reporting to streamline data access and empower teams.

Big DataData EngineeringData Warehouse

0 likes · 12 min read

Why Your Data Team Is Drowning in Requests—and How OLAP Can Save You

StarRocks

May 14, 2024 · Artificial Intelligence

How Tencent Games Boosted AI‑Generated SQL Accuracy to 89% with a Lakehouse Architecture

Tencent Games tackled the low accuracy of AI‑generated SQL in production by combining large language models with a StarRocks lake‑warehouse, introducing a semantic layer, async materialized views, and an agent‑based multi‑intelligence framework, ultimately raising one‑shot SQL correctness to 89% and cutting delivery time from 2 hours to 0.33 hours.

AIData EngineeringLLM

0 likes · 13 min read

How Tencent Games Boosted AI‑Generated SQL Accuracy to 89% with a Lakehouse Architecture

DataFunTalk

Apr 20, 2024 · Big Data

Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices

This article details Tencent Video’s data business, describing the design and implementation of its metrics middle platform and lake‑warehouse integration, covering architecture, governance, consistency, timeliness, usability, cost optimization, and future plans, with insights into technology choices such as Iceberg, StarRocks, and MQL.

Big DataData EngineeringData Governance

0 likes · 18 min read

Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices

DataFunTalk

Apr 14, 2024 · Big Data

Third‑Generation Metric Platform: Enabling a Light Data Warehouse with NoETL

This article explains how a third‑generation metric platform replaces traditional ETL‑heavy data‑warehouse pipelines with a semantic‑driven NoETL approach, reducing cost, improving quality and efficiency, and delivering automated, self‑service analytics for both IT and business users.

Big DataData EngineeringData Warehouse

0 likes · 16 min read

Third‑Generation Metric Platform: Enabling a Light Data Warehouse with NoETL

Data Thinking Notes

Mar 27, 2024 · Big Data

How to Build and Optimize a Scalable User Profiling Platform from Scratch

This article explains the value of user profiling platforms, outlines their core functions, presents a layered architecture with open‑source options, and details engineering optimizations—from wide‑table design to BitMap caching and task‑mode execution—while also discussing current industry trends.

Big DataData EngineeringPerformance Optimization

0 likes · 18 min read

How to Build and Optimize a Scalable User Profiling Platform from Scratch

DataFunTalk

Mar 26, 2024 · Big Data

Building an Enterprise Real-Time Data Warehouse with Hologres and Flink at Cao Cao Mobility

This article presents a comprehensive case study of Cao Cao Mobility's transition from a traditional Lambda architecture to an enterprise‑grade real‑time data warehouse built on Hologres and Flink, detailing business background, pain points, architectural design, performance optimizations, metadata management, and future development directions.

Big DataData EngineeringFlink

0 likes · 20 min read

Building an Enterprise Real-Time Data Warehouse with Hologres and Flink at Cao Cao Mobility

DataFunSummit

Mar 21, 2024 · Big Data

Kuaishou Analytics Service 3.0: Architecture, Evolution, and Practice

This article presents Kuaishou's end‑to‑end analytics platform, detailing the evolution from the early tool‑based stage through Service 1.0 and 2.0 to the unified Service 3.0 architecture, its unified analysis and query engines, data acceleration techniques, performance gains, and future intelligent analytics roadmap.

Data EngineeringKuaishouUnified Engine

0 likes · 16 min read

Kuaishou Analytics Service 3.0: Architecture, Evolution, and Practice

Alipay Experience Technology

Mar 19, 2024 · Big Data

How Alipay Cut Merchant Bill Complexity by 60% Using a Five‑Step Method

This article details how Alipay's data engineering team applied Elon Musk's five‑step work method to completely refactor a decade‑old merchant billing system, reducing overall complexity by over 60%, improving timeliness by an hour, cutting storage and compute costs by a third, and dramatically lowering operational and maintenance burdens.

AutomationBig DataData Engineering

0 likes · 23 min read

How Alipay Cut Merchant Bill Complexity by 60% Using a Five‑Step Method

DataFunSummit

Mar 15, 2024 · Product Management

How to Build a Good Data Platform: Insights from Tencent’s Senior Product Manager

This presentation shares the speaker’s experience and practical methods for creating an effective data platform, covering the transition from technical roles to product management, deep understanding of data workers' needs, Tencent Oura asset‑factory practices, a product‑management methodology, and a Q&A session that addresses governance, performance, and engineering challenges.

Data EngineeringData Governanceproduct management

0 likes · 15 min read

How to Build a Good Data Platform: Insights from Tencent’s Senior Product Manager

DataFunSummit

Mar 12, 2024 · Big Data

Solving Massive Data Retrieval Demands: From Root Causes to OLAP Multidimensional Reporting Solutions

This article analyzes why data engineers face endless data‑retrieval requests, identifies common missteps in data‑construction such as demand‑driven development, lack of modeling and OLAP concepts, and proposes a dimension‑model‑based data warehouse with OLAP reporting, tooling, and knowledge‑empowerment to break the cycle.

Data EngineeringOLAPReporting

0 likes · 13 min read

Solving Massive Data Retrieval Demands: From Root Causes to OLAP Multidimensional Reporting Solutions

DataFunSummit

Mar 11, 2024 · Big Data

Evolution of iQIYI's Event Tracking System and Its Data Processing Pipeline

This article outlines the importance of event tracking for data, describes iQIYI's five‑stage tracking system evolution, analyzes the challenges of the self‑service phase, presents the middle‑platform improvements, explains the migration strategy, and details the downstream data lake, real‑time stream, and data‑warehouse processing workflows.

Data Engineeringdata pipelineiQIYI

0 likes · 13 min read

Evolution of iQIYI's Event Tracking System and Its Data Processing Pipeline

Huolala Tech

Mar 7, 2024 · Big Data

Integrating Apache Tez with Remote Shuffle Service via Uniffle: HuoLala’s Experience

Facing exploding data volumes and rising cluster costs, HuoLala adopted Apache Tez’s Remote Shuffle Service built on Apache Uniffle, redesigning the Tez client to operate without source modifications, detailing architecture, implementation challenges, testing, stability measures, and future plans to enhance big‑data shuffle performance and cost efficiency.

Apache TezBig DataData Engineering

0 likes · 14 min read

Integrating Apache Tez with Remote Shuffle Service via Uniffle: HuoLala’s Experience

DataFunTalk

Mar 3, 2024 · Big Data

Alluxio Local Cache for Presto on S3: Architecture, Implementation, and Performance Evaluation at NewsBreak

This article presents NewsBreak's practical deployment of Alluxio Local Cache with Presto on S3, detailing the system architecture, cache design considerations, implementation steps, performance metrics, and future optimization directions to reduce query latency and storage costs.

AlluxioBig DataCache

0 likes · 12 min read

Alluxio Local Cache for Presto on S3: Architecture, Implementation, and Performance Evaluation at NewsBreak

Airbnb Technology Team

Mar 1, 2024 · Big Data

Riverbed: A Scalable Data Framework for Real‑time and Batch Processing at Airbnb

Airbnb’s Riverbed framework unifies streaming CDC events and batch Spark jobs behind a GraphQL‑based declarative API to automatically build and maintain distributed materialized views, using Kafka‑partitioned ordering and version control to deliver billions of daily updates with low‑latency reads for features such as payments and search.

AirbnbApache SparkData Engineering

0 likes · 8 min read

Riverbed: A Scalable Data Framework for Real‑time and Batch Processing at Airbnb

DataFunTalk

Feb 25, 2024 · Big Data

Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans

This article details Bilibili's tag system from its 2021 inception through successive redesigns, describing the three‑layer architecture, data flow pipelines using Hive, Iceberg, Spark and ClickHouse, crowd selection DSL, online services with Redis, performance optimizations, and upcoming governance and quality initiatives.

Big DataClickHouseData Engineering

0 likes · 12 min read

Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans

Ctrip Technology

Feb 22, 2024 · Backend Development

Design and Implementation of a Serverless Data Filling Engine for UnifiedPB in Ctrip Hotel Recommendation System

This article describes how Ctrip's hotel recommendation team built a serverless, configuration‑driven data‑filling engine based on UnifiedPB protobuf schemas to improve development efficiency, reduce cost, ensure data quality, and achieve unified three‑region data delivery across more than twenty recommendation scenarios.

Data EngineeringEfficiencyServerless

0 likes · 12 min read

Design and Implementation of a Serverless Data Filling Engine for UnifiedPB in Ctrip Hotel Recommendation System

Amap Tech

Feb 5, 2024 · Artificial Intelligence

Gaode Tech 2023 Highlights: 15 Popular Articles on AI, Data, Mapping, and Navigation Technologies

Gaode Technology’s 2023 roundup showcases fifteen of its most-read articles, spanning AI infrastructure evolution, cloud‑native data optimization, BEV‑based perception, real‑time crowdsourced mapping, ETA prediction, lane‑level navigation, AR HUD, architecture design, low‑code platforms, and high‑performance Android testing.

AIBig DataData Engineering

0 likes · 9 min read

Gaode Tech 2023 Highlights: 15 Popular Articles on AI, Data, Mapping, and Navigation Technologies

Big Data Technology & Architecture

Jan 31, 2024 · Big Data

2023 Data Development Trends and Outlook for 2024

The article reviews how data development accelerated in 2023—with mature offline computing, rapid adoption of real‑time and lake‑warehouse solutions, and a clearer technical layering—while offering practical insights and future directions for professionals entering 2024.

Big DataData EngineeringIndustry Trends

0 likes · 8 min read

2023 Data Development Trends and Outlook for 2024

Bilibili Tech

Jan 23, 2024 · Databases

Unique Engine Design and Implementation in ClickHouse for Bilibili Live Guild Data

Bilibili migrated its live‑guild analytics from MySQL to ClickHouse, creating a custom ReplicatedUniqueMergeTree engine that uses delete‑on‑insert, min‑max and hash‑bucketed indexes with delete bitmaps to achieve 10‑20× faster queries and scalable near‑real‑time reporting despite higher write latency.

ClickHouseData EngineeringUnique Engine

0 likes · 18 min read

Unique Engine Design and Implementation in ClickHouse for Bilibili Live Guild Data