Tagged articles

283 articles

Page 2 of 3

Sep 13, 2023 · Artificial Intelligence

Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development

This article presents a comprehensive overview of data engineering practices for large model training, reviews current model scales and pre‑training data sources, discusses automated evaluation techniques, and explores how knowledge graphs can be integrated throughout the model lifecycle to improve quality and applicability.

AIautomated evaluationdata engineering

0 likes · 29 min read

Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development

Alibaba Cloud Developer

Sep 13, 2023 · Big Data

How to Quickly Land as a Data Engineer in a New Company

This guide explains how data engineers can rapidly adapt to a new workplace by mastering business context, data domains, and system architecture, using structured learning, practical case studies, and continuous reflection to earn trust and deliver value efficiently.

OnboardingSystem Architecturebusiness knowledge

0 likes · 15 min read

How to Quickly Land as a Data Engineer in a New Company

Alibaba Cloud Big Data AI Platform

Sep 11, 2023 · Big Data

How RoaringBitmap Cut User Profile Analysis from Minutes to Seconds

This article explains how Alibaba's user growth platform leveraged RoaringBitmap in Hologres to accelerate massive user profiling, reducing analysis time from several minutes to around ten seconds by redesigning bitmap storage, optimizing data pipelines, and employing efficient SQL and scheduling techniques.

Big DataBitmapHologres

0 likes · 17 min read

21CTO

Sep 8, 2023 · Big Data

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

Real-time data processing transforms traditional batch pipelines by delivering fresh, low‑latency data to millions of concurrent users, leveraging event‑driven architectures, streaming engines, and real‑time databases, with use cases ranging from fraud detection to personalized e‑commerce and operational dashboards, and includes reference architectures and tool recommendations.

Big DataReal-time ProcessingStreaming

0 likes · 16 min read

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

DataFunTalk

Aug 16, 2023 · Artificial Intelligence

Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development

This article presents a comprehensive overview of data engineering practices, pre‑training data composition, automated model evaluation techniques, and the synergistic use of knowledge graphs within large‑scale AI model research, highlighting pipelines, quality criteria, and practical case studies.

Knowledge Graphautomation evaluationdata engineering

0 likes · 29 min read

DataFunTalk

Aug 10, 2023 · Big Data

iQIYI Magic Mirror: Evolution of a Big Data Analysis Platform

The article details how iQIYI's Magic Mirror platform evolved from a simple single‑table reporting tool to a multi‑engine, self‑service big data analysis system that improves data access speed, reduces operational costs, and supports comprehensive business analytics across the company.

Data visualizationMagic Mirrorbig data platform

0 likes · 17 min read

iQIYI Magic Mirror: Evolution of a Big Data Analysis Platform

DataFunSummit

Jul 28, 2023 · Big Data

User Path Analysis and SessionAnalytics: Business Practices, Technical Architecture, and Open‑Source Framework

This article introduces user path analysis and the SessionAnalytics open‑source framework, covering business scenarios, data processing techniques, algorithmic mining methods, technical architecture, implementation details, comparisons with event‑based analysis, and a comprehensive Q&A for practical deployment.

Big DataNLPdata engineering

0 likes · 19 min read

User Path Analysis and SessionAnalytics: Business Practices, Technical Architecture, and Open‑Source Framework

Big Data Technology & Architecture

Jul 21, 2023 · Big Data

Big Data Interview Experience Summary: Topics, Weightings, and Key Takeaways

The article shares a detailed interview experience for big‑data roles, outlining the proportion of problem‑solving, project, fundamentals, and open‑question segments, and highlights the technical depth expected in areas such as Flink, Hudi, SparkSQL, and OLAP.

FlinkSparkSQLcareer advice

0 likes · 5 min read

Big Data Interview Experience Summary: Topics, Weightings, and Key Takeaways

Top Architect

Jul 14, 2023 · Big Data

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

This article introduces the Lambda Architecture for billion‑scale real‑time data analysis, explains its three layers—Batch, Speed, and Serving—covers its flexibility, fault tolerance, and scalability, and demonstrates concrete applications such as Twitter hashtag analysis and a smart‑parking recommendation system.

Batch LayerBig DataLambda architecture

0 likes · 11 min read

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

政采云技术

Jun 15, 2023 · Big Data

Optimizing Data Lineage Extraction Using Spline REST API

This article discusses the practical implementation of extracting table and field lineage information via the Spline REST API, analyzing API call frequency, server load tolerance, and the strategy of re-parsing lineage only when job versions change to optimize performance.

Data LineageREST APISpline

0 likes · 5 min read

Optimizing Data Lineage Extraction Using Spline REST API

DevOps Cloud Academy

Jun 1, 2023 · Big Data

DataOps 2.0: Integrated Data Development and Governance Practices at NetEase

The article recounts NetEase’s presentation at the inaugural DataOps conference, detailing the evolution from DataOps 1.0 pipeline to a 2.0 integrated data development‑governance model, the challenges faced, practical solutions, and strategic advice for data managers.

Big DataData GovernanceData Management

0 likes · 11 min read

DataOps 2.0: Integrated Data Development and Governance Practices at NetEase

JD Cloud Developers

May 30, 2023 · Big Data

ClickHouse & Flink: Choosing Engines, Tuning Queries, and Scaling Concurrency

This article details how JDQ, Flink, and ClickHouse were integrated to replace Elasticsearch for real‑time reporting, covering table‑engine selection, Flink sink implementation, performance bottlenecks, CPU hot‑spots, query optimization techniques, and strategies for handling high concurrency while ensuring data consistency and system stability.

ClickHouseFlinkSQL Optimization

0 likes · 46 min read

ClickHouse & Flink: Choosing Engines, Tuning Queries, and Scaling Concurrency

Big Data Technology & Architecture

May 29, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, describes its lake architecture based on Apache Hudi and Flink, outlines five major production challenges—including ingestion bottlenecks, snapshot queries, update bottlenecks, merge limitations, and operational reliability—and details the practical solutions and future roadmap.

Apache HudiFlinkdata engineering

0 likes · 18 min read

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

Architects Research Society

May 20, 2023 · Cloud Native

Leveraging Software Architecture at Nubank: From Startup to Scale

This article chronicles Nubank’s architectural evolution—detailing how strategic technology choices, cloud‑native platforms, micro‑service design, and data‑engineering practices were leveraged across startup, growth, consolidation, and expansion phases to achieve massive scalability and business agility.

Cloud NativeKubernetesMicroservices

0 likes · 24 min read

Leveraging Software Architecture at Nubank: From Startup to Scale

DataFunSummit

Apr 24, 2023 · Artificial Intelligence

OpenMLDB: A Production‑Grade Feature Platform for Consistent Online and Offline Machine Learning

OpenMLDB is an open‑source machine‑learning database that delivers a production‑grade, consistent online‑offline feature platform for real‑time AI applications such as recommendation, risk control and fraud detection, offering millisecond‑level feature computation, dual SQL engines, extensive ecosystem integration, and a roadmap of new capabilities.

AIFeature StoreOpenMLDB

0 likes · 13 min read

OpenMLDB: A Production‑Grade Feature Platform for Consistent Online and Offline Machine Learning

Python Programming Learning Circle

Apr 23, 2023 · Big Data

Parallel Processing of Large CSV Files in Python with multiprocessing, joblib, and tqdm

This tutorial demonstrates how to accelerate processing of a 2.8‑million‑row CSV dataset by using Python's multiprocessing, joblib, and tqdm libraries, covering serial, parallel, and batch processing techniques, performance measurements, and best‑practice code examples for efficient large‑scale data handling.

Big DataPythondata engineering

0 likes · 9 min read

Parallel Processing of Large CSV Files in Python with multiprocessing, joblib, and tqdm

DataFunTalk

Mar 25, 2023 · Artificial Intelligence

ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications

This article presents ZhongAn Financial’s end‑to‑end MLOps workflow and real‑time feature platform architecture, detailing team roles, data pipelines, Flink‑based processing, TableStore storage, anti‑fraud feature design, and answers to common implementation questions, offering a comprehensive guide for building scalable, low‑latency ML services in finance.

FlinkMLOpsTablestore

0 likes · 25 min read

ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications

Architecture Digest

Mar 22, 2023 · Big Data

Performance Platform: Accelerating Data Production and Consumption

This article details how the Performance Platform at Baidu speeds up data production and consumption across the company's R&D pipelines by introducing five optimization paths, 18 concrete methods, service tiering, compliance measures, and self‑service analytics for both real‑time memory tables and offline disk tables.

ETLSelf-Service Analyticsdata compliance

0 likes · 13 min read

Performance Platform: Accelerating Data Production and Consumption

Huolala Tech

Mar 16, 2023 · Big Data

How HuoLala’s YunTai BI Platform Transforms Data Visualization at Scale

The article details HuoLala’s internally built YunTai BI platform, covering its motivation, system architecture, data source integration, zero‑code modeling, visual report and dashboard creation, performance optimizations, and future plans for stability and code design, illustrating a comprehensive big‑data visualization solution.

BIData visualizationdata engineering

0 likes · 13 min read

How HuoLala’s YunTai BI Platform Transforms Data Visualization at Scale

Baidu Geek Talk

Mar 6, 2023 · Big Data

Accelerating Data Production and Consumption in Baidu's Performance Platform

Baidu's Performance Platform speeds data production and consumption by adopting a unified stream‑batch architecture with TM and Spark, leveraging the Turing warehouse, introducing tiered service grading, robust governance and compliance measures, and offering self‑service analytics, cutting latency from minutes or days to milliseconds while handling billions of daily records and boosting SLA adherence, data accuracy, and user satisfaction.

Big DataData GovernanceReal-time Processing

0 likes · 12 min read

Accelerating Data Production and Consumption in Baidu's Performance Platform

DataFunSummit

Mar 2, 2023 · Big Data

Huya's Data Self‑Service Product: Challenges, Design, and Practice

The article presents Huya's data‑self‑service product, describing the problems of traditional data services, the principles of a good data service, the MVP implementation, architectural components, project outcomes, and future evolution, while also addressing common Q&A scenarios.

Big DataData Productdata engineering

0 likes · 12 min read

Huya's Data Self‑Service Product: Challenges, Design, and Practice

DataFunTalk

Feb 18, 2023 · Artificial Intelligence

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights

This article details Du Xiaoman's development of the ATLAS automated machine learning platform, covering business scenarios, AI algorithm deployment challenges, the end‑to‑end production workflow, platform components such as annotation, data, training and deployment, as well as optimization techniques like AutoML, meta‑learning, NAS, and large‑scale parallelism, concluding with lessons learned and future directions.

AI deploymentAutoMLMachine Learning Platform

0 likes · 20 min read

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights

dbaplus Community

Feb 15, 2023 · Big Data

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.

ClickHouseFlinkIceberg

0 likes · 32 min read

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

Kuaishou Big Data

Feb 3, 2023 · Big Data

Inside Kuaishou’s Company‑Wide Metric Platform: Architecture, Lessons & Best Practices

This article details Kuaishou’s three‑year evolution of its metric middle platform, covering the data infrastructure, key challenges of data inconsistency and low analysis efficiency, the enterprise‑level OneMetric solution, architectural design, development phases, practical lessons, system implementation, and real‑world applications.

Big DataKuaishoudata engineering

0 likes · 23 min read

Inside Kuaishou’s Company‑Wide Metric Platform: Architecture, Lessons & Best Practices

Alimama Tech

Jan 11, 2023 · Big Data

Dolphin Streaming: Real-Time SQL-Based Data Development Platform for Alibaba Advertising

Dolphin Streaming provides Alibaba’s advertising merchants with a DB‑like, SQL‑driven real‑time data platform built on Flink that abstracts storage and compute, enabling non‑engineers to develop, debug, and deploy streaming feature jobs quickly, boosting query volume, QPS, and revenue.

Dolphin StreamingFlinkReal-time Streaming

0 likes · 13 min read

Dolphin Streaming: Real-Time SQL-Based Data Development Platform for Alibaba Advertising

DataFunSummit

Dec 29, 2022 · Big Data

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

This article explains the Lakehouse concept, why it is needed, the limitations of traditional data warehouses and data lakes, and how Databricks’ unified architecture—through open storage formats, fine‑grained governance, and optimized query engines—delivers high‑quality, low‑latency data for BI, analytics, and machine learning workloads.

DatabricksDelta LakeLakehouse

0 likes · 21 min read

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

Alibaba Cloud Developer

Dec 26, 2022 · Backend Development

How to Build a Scalable Tag/Profile System for Marketing Automation

This article shares engineering practices for constructing a tag‑profile system, covering core concepts, minimal architecture, technology selection, key modules such as estimation, selection, deployment, and validation, and offers design details and implementation tips for large‑scale marketing scenarios.

Alibaba CloudBackend ArchitectureMarketing Automation

0 likes · 11 min read

Zhuanzhuan Tech

Dec 15, 2022 · Big Data

Zhuanzhuan User Profile Platform: Architecture, Tag Construction, Storage, and User Segmentation Practices

This article details Zhuanzhuan's user profile platform, covering its business-driven motivation, tag taxonomy, system architecture, data pipelines using Hive, ClickHouse and Spark, storage design, per‑user insight, segmentation techniques, ID‑mapping, and future plans for real‑time tagging.

Big DataHiveTagging

0 likes · 17 min read

JD Tech Talk

Nov 30, 2022 · Databases

Risk Insight Platform Architecture and ClickHouse Implementation for Real-Time Risk Monitoring

The article presents a comprehensive risk insight platform built on ClickHouse, Flink, and intelligent algorithms, detailing its architecture, technical challenges, solutions, real-time data modeling, practical applications in fraud detection and user behavior analysis, and future optimization directions.

Big DataOLAPReal-time analytics

0 likes · 13 min read

Risk Insight Platform Architecture and ClickHouse Implementation for Real-Time Risk Monitoring

Architects Research Society

Nov 27, 2022 · Big Data

Building a Data‑Driven Organization: Culture, Structure, and Roles

This article explains the practical steps to transform a company into a data‑driven organization by establishing a self‑service culture, aligning organizational structures, defining key roles such as analysts, engineers, scientists, and CDOs, and addressing common obstacles and best‑practice tips.

CultureData-drivendata engineering

0 likes · 23 min read

Building a Data‑Driven Organization: Culture, Structure, and Roles

Alibaba Cloud Big Data AI Platform

Nov 25, 2022 · Big Data

What Drives the Next Wave of Open‑Source Big Data? Insights from the 2022 Heat Report

The 2022 Open Source Big Data Heat Report analyzes 102 active projects since 2015, revealing that heat values double every 40 months, highlighting diversification, integration, and cloud‑native trends, and offering guidance for developers, contributors, and project maintainers navigating the evolving big‑data landscape.

data engineeringtechnology trends

0 likes · 15 min read

What Drives the Next Wave of Open‑Source Big Data? Insights from the 2022 Heat Report

Tencent Cloud Developer

Nov 7, 2022 · Big Data

Data Engineering and Data Warehouse Design: Principles, Practices, and Governance

The article outlines comprehensive data‑engineering and warehouse‑design principles—covering collection (four Ws and methods like SDK, point‑code, binlog), reporting strategies, source selection, modeling with fact, aggregation, dimension and model tables, quality checks, and governance practices such as standardized SDKs, metric libraries, automated lineage, and cost optimization—to share actionable experience for any organization.

Big DataData GovernanceData Warehouse

0 likes · 32 min read

Data Engineering and Data Warehouse Design: Principles, Practices, and Governance

DevOps Cloud Academy

Oct 22, 2022 · Fundamentals

How to Write Your First Apache Airflow DAG (Hello World)

This tutorial walks through creating a simple “Hello World” Apache Airflow DAG by setting up the Python file, importing modules, defining the DAG object, adding a PythonOperator task, writing the callable function, and running the DAG with Airflow’s webserver and scheduler.

Apache AirflowDAGPython

0 likes · 9 min read

How to Write Your First Apache Airflow DAG (Hello World)

Hulu Beijing

Oct 21, 2022 · Big Data

How Hulu Scales Spark on Kubernetes: Cloud‑Native Big Data at Disney‑Scale

Hulu’s data platform team describes how they migrated large‑scale Spark workloads from Yarn to native Spark on Kubernetes, leveraging AWS services such as EKS, S3, and custom operators to achieve dynamic scaling, unified monitoring, cost‑effective resource management, and improved stability for search, recommendation, and advertising pipelines.

AWSBig DataCloud Native

0 likes · 18 min read

How Hulu Scales Spark on Kubernetes: Cloud‑Native Big Data at Disney‑Scale

DataFunTalk

Aug 31, 2022 · Big Data

Solving Data‑Driven Full‑Link Technical Challenges: A Case Study of the Kai Shu Storytelling App

This article analyzes the technical difficulties of building a data‑driven full‑link system and describes how the Kai Shu Storytelling app overcame them by adopting DataFinder for automated event tracking, metric management, and growth analysis, offering practical guidance for enterprises and developers.

AnalyticsApp DevelopmentData-driven

0 likes · 7 min read

Solving Data‑Driven Full‑Link Technical Challenges: A Case Study of the Kai Shu Storytelling App

Selected Java Interview Questions

Aug 27, 2022 · Backend Development

Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide

This article compares Elasticsearch and ClickHouse for log analytics, presents cost analysis, and provides detailed deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse to build a private, high‑performance backend data platform for SaaS services.

ClickHouseElasticsearchFilebeat

0 likes · 12 min read

Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide

DataFunSummit

Aug 26, 2022 · Big Data

Data Governance Practice and Logical Closed‑Loop at KuaiKan: A Case Study

This article presents KuaiKan's data governance journey, detailing the rapid business expansion challenges, the three‑step planning framework, the logical closed‑loop architecture, practical implementation experiences, cross‑team collaboration techniques, and the evaluation of governance outcomes and future plans.

Data Qualitydata engineering

0 likes · 16 min read

Data Governance Practice and Logical Closed‑Loop at KuaiKan: A Case Study

DevOps Cloud Academy

Aug 8, 2022 · Operations

Understanding DataOps ETL: Benefits, Automation, and Implementation Guide

This article explains DataOps and its role in modern ETL pipelines, outlines the benefits of DataOps for efficiency and reliability, and provides a detailed roadmap and best‑practice guidelines for planning, implementing, and optimizing DataOps‑driven ETL in cloud‑native environments.

AutomationData IntegrationDataOps

0 likes · 13 min read

Understanding DataOps ETL: Benefits, Automation, and Implementation Guide

DataFunSummit

Jul 9, 2022 · Big Data

Alibaba's One‑Stop Real‑Time Data Warehouse: Hologres Architecture and CCO Implementation Experience

The article reviews the shift of big‑data computing from batch to real‑time, outlines the evolution of one‑stop real‑time data warehouses, introduces Alibaba's Hologres solution and its technical advantages, and shares the CCO department’s three‑generation architecture upgrades and practical use cases.

AlibabaHologresdata engineering

0 likes · 16 min read

Alibaba's One‑Stop Real‑Time Data Warehouse: Hologres Architecture and CCO Implementation Experience

DaTaobao Tech

Jul 8, 2022 · Frontend Development

Alibaba Front‑End Intelligent Technology: PipCook, DataCook, imgcook and Future Directions

Alibaba Front‑End Intelligent Technology combines PipCook, DataCook, and imgcook to enable data‑driven UI generation, on‑device AI inference via WASM‑Rust‑SIMD and WebGPU, and applications such as code IntelliSense and design‑to‑code, while outlining a roadmap toward unified AI‑powered interfaces for commerce.

AITensorFlow.jsWasm

0 likes · 33 min read

Alibaba Front‑End Intelligent Technology: PipCook, DataCook, imgcook and Future Directions

AntTech

Jun 29, 2022 · Big Data

YoDA: Reducing Entropy in Ant Financial Risk Data Systems through White‑Box, Logical, and Integrated Approaches

The YoDA project tackles the growing entropy of Ant Financial's risk data platform by introducing white‑box visibility, logical abstraction, and integrated heterogeneous fusion, enabling systematic governance, cost reduction, and consistent decision‑making across online, offline, and near‑line environments.

AIEntropy ReductionSystem Architecture

0 likes · 21 min read

YoDA: Reducing Entropy in Ant Financial Risk Data Systems through White‑Box, Logical, and Integrated Approaches

High Availability Architecture

Jun 29, 2022 · Big Data

Interview with Shopee Data Engineer Deng Lin on Lakehouse Architecture and Big Data Trends

During a pre‑GIAC interview, Shopee data engineer Deng Lin discusses the evolution of data lakes and warehouses, lakehouse integration, big‑data technology choices, real‑time processing with Flink and Kafka, and offers career advice for newcomers to the big‑data field.

Big DataFlinkKafka

0 likes · 10 min read

Interview with Shopee Data Engineer Deng Lin on Lakehouse Architecture and Big Data Trends

Bilibili Tech

May 31, 2022 · Big Data

Bilibili Offline Computing Platform: Migration from Hive to Spark and Operational Practices

Bilibili migrated its massive offline platform from Hive to Spark using an automated SQL rewrite and dual‑run verification, cutting execution time over 40% and resource use 30%, while introducing small‑file merging, shuffle stability, runtime filters, data‑skipping, lineage tracking, auto‑parameter tuning, and metastore federation for robust large‑scale processing.

Big DataHiveSpark

0 likes · 30 min read

Bilibili Offline Computing Platform: Migration from Hive to Spark and Operational Practices

DataFunSummit

May 29, 2022 · Big Data

OPPO Commercial Data System Construction Practice: Platform, Ingestion, Development, Governance, and Analytics

This article presents OPPO's commercial data system construction practice, covering the data platform strategy, ingestion pipelines, development efficiency toolkits, data validation, visualization aids, UDF principles, warehouse architecture, metric systems, dimensional modeling, ETL optimization, governance metadata, quality management, monitoring, attribution services, analytics reporting, and a Q&A session.

AnalyticsData Platformdata engineering

0 likes · 17 min read

OPPO Commercial Data System Construction Practice: Platform, Ingestion, Development, Governance, and Analytics

dbaplus Community

May 21, 2022 · Big Data

5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market

The article outlines five major 2022 data trends— the rise of analytics engineers, the intensifying lake‑house competition, the growth of real‑time streaming pipelines and operational analytics, the expanding cloud marketplaces for data tools, and the push toward unified data‑quality terminology—explaining their origins, market impact, and future outlook.

Data QualityLakehouseReal-time Streaming

0 likes · 21 min read

5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market

Alibaba Cloud Developer

May 18, 2022 · Big Data

Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees

This article explains how Delta Lake adds reliability to data lakes by offering ACID transactions, scalable metadata, and unified batch‑and‑stream processing, outlines the challenges it solves, details its implementation principles, and demonstrates a practical demo for building an integrated data warehouse.

ACIDBig DataData Lake

0 likes · 9 min read

Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees

Big Data Technology Architecture

Apr 29, 2022 · Big Data

Halodoc’s Data Platform Evolution: From Redshift to a LakeHouse Architecture with Apache Hudi

This article describes how Halodoc’s data engineering team identified limitations of their Redshift‑based platform, evaluated a LakeHouse design, selected Apache Hudi for mutable data handling, and outlined the challenges and benefits of building a scalable, decoupled storage‑compute architecture for their growing healthcare services.

Apache HudiData Platformdata engineering

0 likes · 9 min read

Halodoc’s Data Platform Evolution: From Redshift to a LakeHouse Architecture with Apache Hudi

DataFunTalk

Apr 19, 2022 · Artificial Intelligence

Intelligent Risk Control Platform: Design Principles, Strategy and Model Lifecycle Management, and Architecture

This article presents a comprehensive overview of an intelligent risk control platform, covering its design background, six core characteristics, the "five‑full double‑core" concept, end‑to‑end strategy and model lifecycle management, business architecture atomization, and real‑world anti‑fraud case studies.

AIModel Managementdata engineering

0 likes · 13 min read

Intelligent Risk Control Platform: Design Principles, Strategy and Model Lifecycle Management, and Architecture

Big Data Technology & Architecture

Apr 6, 2022 · Big Data

Data Quality Issues, Causes, and Practices in Big Data Platforms

This article explains the harms and root causes of data quality problems—such as integrity, latency, accuracy, and consistency issues—then outlines systematic prevention methods, baseline monitoring, and concrete NetEase YouShu platform practices, illustrated with real incidents, code snippets, and tag‑monitoring strategies.

data engineeringincident management

0 likes · 10 min read

Data Quality Issues, Causes, and Practices in Big Data Platforms

58 Tech

Mar 29, 2022 · Big Data

Design and Implementation of the 58 Group Penalty Data Center

This article presents the design, architecture, and implementation of a unified penalty data center for 58 Group, detailing the challenges of heterogeneous data sources, the selection of Flink for real‑time ETL, the use of a DSL and LRU aggregation, and the adoption of MVEL for feature recognition to achieve standardized, high‑performance penalty data processing.

Big DataETLFlink

0 likes · 13 min read

Design and Implementation of the 58 Group Penalty Data Center

Architects Research Society

Mar 11, 2022 · Artificial Intelligence

Key Software Industry Trends in 2021 and What to Watch in 2022

The 2021 software industry review highlights the rise of hybrid work, the continued dominance of microservices, emerging data engineering and AI/ML practices, ethical and sustainability concerns, multi‑cloud and cloud‑native adoption, and anticipates further developments in these areas throughout 2022.

AIEthicscloud computing

0 likes · 14 min read

Key Software Industry Trends in 2021 and What to Watch in 2022

21CTO

Feb 24, 2022 · Big Data

5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time

In 2022 the modern data stack will be driven by the rise of analytics engineers, intensified competition between lakehouse and warehouse solutions, growing demand for real‑time analytics, the explosive growth of cloud marketplaces, and the emergence of unified data‑quality terminology, all reshaping data infrastructure and operational practices.

Data QualityLakehouseReal-time analytics

0 likes · 17 min read

5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time

MaGe Linux Operations

Jan 27, 2022 · Big Data

2021 InfoWorld BOSSIE Awards: 29 Must‑Know Open‑Source Projects Across AI, Data & Cloud

InfoWorld's 2021 BOSSIE Awards highlight 29 standout open‑source projects—from front‑end frameworks like Svelte to cloud‑native tools such as Minikube, AI platforms like Hugging Face, data‑engineered solutions including Presto and Apache Arrow, and many more—offering developers a curated snapshot of the most influential software of the year.

AIdata engineeringopen source

0 likes · 19 min read

2021 InfoWorld BOSSIE Awards: 29 Must‑Know Open‑Source Projects Across AI, Data & Cloud

NetEase LeiHuo UX Big Data Technology

Jan 20, 2022 · Big Data

Netease Thunderfire UX Big Data Technology: An Overview of Game Data Practices and Insights

The article introduces Netease Thunderfire UX's big data team and its multidisciplinary approach to game user experience, covering topics such as mathematics, engineering, product practice, AI, visualization, platform development, data analysis, and tool creation for game data professionals.

AIdata engineeringvisualization

0 likes · 11 min read

Netease Thunderfire UX Big Data Technology: An Overview of Game Data Practices and Insights

Meituan Technology Team

Dec 30, 2021 · Frontend Development

Meituan Tech Team’s 2021 Top Technical Articles – New Year Gift 2022

To celebrate the 2022 New Year, Meituan’s technology team offers a curated gift of the 22 most‑read and most‑watched 2021 technical articles—spanning logging, knowledge graphs, GraphQL, data warehousing, performance, security, and more—while inviting readers to complete a survey for a chance to win a premium keyboard wrist rest.

BackendMeituanSoftware Engineering

0 likes · 14 min read

Meituan Tech Team’s 2021 Top Technical Articles – New Year Gift 2022

Big Data Technology & Architecture

Dec 20, 2021 · Big Data

Guide to Alibaba Cloud Community Big Data Resources and Learning Path

This article introduces the Alibaba Cloud Community's big‑data section, outlines its extensive learning resources—including e‑books, Q&A, learning paths, open courses, and activities—explains why the industry has shifted toward cloud‑based platforms, and provides links for deeper exploration, all aimed at helping newcomers advance in big data engineering.

Alibaba CloudLearning Resourcescloud community

0 likes · 9 min read

Guide to Alibaba Cloud Community Big Data Resources and Learning Path

JD Cloud Developers

Dec 15, 2021 · Big Data

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

This article details JD Retail's strategic "Nirvana" product‑selection platform, describing the technical challenges of handling billions of items and hundreds of tags, and presenting a dual‑engine solution using ClickHouse and Elasticsearch with Spark‑driven data pipelines to achieve fast filtering, multidimensional analytics, and efficient storage.

Big DataClickHouseElasticsearch

0 likes · 15 min read

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

DataFunSummit

Dec 13, 2021 · Big Data

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Directions

This article presents the design, implementation, and operational experience of Tencent's game big‑data analysis platform, covering its background, the offline, online, and real‑time multi‑dimensional analysis engines, practical use cases, performance optimizations, and future roadmap.

Game AnalyticsReal-time ProcessingTencent

0 likes · 14 min read

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Directions

IT Architects Alliance

Dec 8, 2021 · Industry Insights

6 Proven Strategies to Modernize Your Cloud Data Warehouse

This article outlines six practical strategies—identifying bottlenecks, empowering data engineers, adopting distributed management, creating data contracts, embracing diverse perspectives, and streamlining workflows—to help organizations leverage cloud data warehouses more efficiently and drive better business intelligence outcomes.

Business IntelligenceData GovernanceData Warehouse

0 likes · 8 min read

6 Proven Strategies to Modernize Your Cloud Data Warehouse

Big Data Technology & Architecture

Dec 5, 2021 · Big Data

2022 and Beyond Data Development Trends, Job Market Insights, and Interview Guidance

The article analyzes post‑2022 data development trends, explains why high‑end positions are scarce while entry‑level roles are highly competitive, and provides detailed campus and social recruitment interview advice, including required skills, project experience, and strategies for standing out in a rapidly maturing big‑data industry.

Interview Preparationcareer advicedata engineering

0 likes · 9 min read

2022 and Beyond Data Development Trends, Job Market Insights, and Interview Guidance

Big Data Technology & Architecture

Nov 30, 2021 · Big Data

User Portrait Development Process and Key Deliverables

This article outlines a comprehensive seven‑stage workflow for building enterprise user portraits—from goal interpretation and requirement analysis through tag development, scheduling, service‑layer integration, productization, optimization, and finally deployment and performance tracking—highlighting critical outputs and common challenges at each step.

ETLdata engineeringtag development

0 likes · 8 min read

User Portrait Development Process and Key Deliverables

Big Data Technology Architecture

Nov 28, 2021 · Big Data

EMR Studio: Architecture and Features for Simplifying Big Data Development

EMR Studio is a one‑stop, open‑source‑compatible big data development platform that integrates Zeppelin, Jupyter, Airflow and a custom Cluster Manager to streamline job creation, scheduling, monitoring, and cluster switching, thereby addressing common usability challenges in Spark, Flink, Hive, and Presto workflows.

AirflowApache SparkEMR Studio

0 likes · 9 min read

EMR Studio: Architecture and Features for Simplifying Big Data Development

Big Data Technology & Architecture

Nov 24, 2021 · Big Data

Big Data Industry Trends and Career Advice for Data Developers

The article analyzes recent Q3 financial reports of major internet companies, discusses the uneven development of data engineering talent, examines the challenges of data platforms and middle‑office services, and offers practical advice for developers to broaden technical depth, improve soft skills, and increase resilience in a tightening market.

Advertising Revenuecareer advicedata engineering

0 likes · 11 min read

Big Data Industry Trends and Career Advice for Data Developers

DataFunTalk

Nov 24, 2021 · Big Data

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans

This article presents Tencent's game big‑data analysis platform, detailing its background, the architecture of the iData engine—including offline multi‑dimensional analysis (TGMars), online portrait analysis (TGFace), and real‑time multi‑dimensional analysis (TGDruid)—application scenarios, performance insights, and future ecosystem and open‑source plans.

Big DataGame AnalyticsOLAP

0 likes · 15 min read

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans

dbaplus Community

Nov 21, 2021 · Big Data

How Small Companies Can Break Into Big Data Projects and Master High‑Concurrency Architecture

This article explores why small and medium enterprises struggle with big‑data adoption, proposes partnership‑based strategies to gain access to large datasets, and offers concrete technical roadmaps—including distributed storage, streaming pipelines, and query stacks—to help engineers practice high‑concurrency big‑data systems.

SME Strategydata engineeringhigh concurrency

0 likes · 9 min read

How Small Companies Can Break Into Big Data Projects and Master High‑Concurrency Architecture

DataFunTalk

Nov 20, 2021 · Big Data

How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices

This article provides a comprehensive guide to designing and implementing a big‑data platform, covering architecture overview, data ingestion with Flume, storage on HDFS/Hive/HBase, processing engines such as Hive, Spark and Flink, scheduling solutions like Azkaban and Airflow, and the construction of self‑service analytics systems.

Big DataETLHadoop

0 likes · 29 min read

How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices

21CTO

Nov 1, 2021 · Big Data

Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master

This guide outlines the fast‑growing data engineering career path, covering essential Linux fundamentals, programming languages, testing, database concepts, data warehouses, processing frameworks, messaging systems, cluster computing, workflow scheduling, monitoring, infrastructure as code, and CI/CD tools.

Big Datadata engineeringdata pipelines

0 likes · 5 min read

Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master

DataFunTalk

Oct 18, 2021 · Big Data

Building an Intelligent Data Warehouse at Yixin Group: A Big Data Platform Case Study

The article describes how Yixin Group’s product team created an in‑house intelligent data warehouse using Hadoop, Flink/Spark, and standardized data services to transform scattered automotive‑finance data into a secure, scalable platform that supports real‑time analytics and drives business growth.

Big DataFlinkHadoop

0 likes · 10 min read

Building an Intelligent Data Warehouse at Yixin Group: A Big Data Platform Case Study

Big Data Technology & Architecture

Oct 14, 2021 · Big Data

Overview of Big Data Architecture Trends and Curated Resources

This article, discovered on the Yunqi community site, provides a system‑architecture perspective overview of current big‑data architecture hotspots, development trajectories, emerging trends, and unresolved challenges, while highlighting the field’s rapid evolution and recommending a curated list of in‑depth resources for further study.

Data ArchitectureResourcesdata engineering

0 likes · 5 min read

Overview of Big Data Architecture Trends and Curated Resources

Big Data Technology & Architecture

Oct 13, 2021 · Big Data

God of Big Data: A Comprehensive Learning Path and Systematic Resources for Big Data Engineers

The "God of Big Data" project, launched in 2019, offers a detailed learning roadmap, systematic column resources covering Hadoop, Spark, Kafka, and more, and invites engineers transitioning from backend to big‑data development to follow curated articles, GitHub code, and CSDN tutorials.

HadoopLearning PathSpark

0 likes · 6 min read

God of Big Data: A Comprehensive Learning Path and Systematic Resources for Big Data Engineers

Airbnb Technology Team

Sep 27, 2021 · Big Data

Midas Certification: Airbnb’s End-to-End Data Quality Framework

Airbnb’s Midas certification establishes a company‑wide, multi‑dimensional golden‑standard for data quality—covering accuracy, consistency, timeliness, cost, and completeness—by requiring collaborative design, automated health checks, and four review stages, ensuring certified data is reliable, well‑documented, and ready for reporting, experimentation, and machine‑learning.

AirbnbBig DataData Quality

0 likes · 12 min read

Midas Certification: Airbnb’s End-to-End Data Quality Framework

DataFunTalk

Sep 10, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing enhancements for coordinator high‑availability, and explaining a cross‑cluster scheduling strategy that leverages idle Presto resources to improve overall big‑data workload efficiency.

Big DataCross-Cluster SchedulingHA

0 likes · 16 min read

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

ByteDance ADFE Team

Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataFlinkLindorm

0 likes · 24 min read

Evolution of the Big Data Technology Stack Over the Past Five Years

DataFunTalk

Aug 28, 2021 · Big Data

Mid‑Year 2021 DSU Reading Selections – Technical Articles, Reflections, and Job Listings

The DSU mid‑year reading collection compiles high‑quality technical articles, reflective essays, and internal job referrals across data architecture, big‑data ecosystems, machine learning, data governance, and career development, providing a searchable resource for data professionals.

Big Datacareerdata engineering

0 likes · 7 min read

Mid‑Year 2021 DSU Reading Selections – Technical Articles, Reflections, and Job Listings

Volcano Engine Developer Services

Aug 3, 2021 · Big Data

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

This article, compiled from a Volcano Engine meetup, explains how ByteDance’s unified traffic platform designs, governs, and processes massive event‑tracking data in real time, covering embedding content solutions, link architecture, dynamic processing engines, and data‑governance practices that support trillions of daily events.

Big DataData GovernanceReal-time Processing

0 likes · 16 min read

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

Airbnb Technology Team

Jul 29, 2021 · Big Data

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

Airbnb’s 2019 Data Quality Improvement Plan reorganized its data‑engineering workforce, introduced a dedicated data‑engineer role, adopted a decentralized Minerva‑based architecture with Spark pipelines, instituted rigorous testing, governance, and certification processes, and established SLAs and monitoring to ensure timely, trustworthy, well‑documented data across the enterprise.

AirbnbBig DataData Architecture

0 likes · 13 min read

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

DataFunTalk

Jul 2, 2021 · Big Data

Exploring JD Logistics’ Billion‑Scale Data Management and Analytics with Apache Doris

This article details JD Logistics’ challenges in handling petabyte‑level data, outlines their existing data architecture, and explains how they adopted Apache Doris for faster, scalable analytics, covering table management, data import workflows, visualization tools, and future roadmap for data engineering.

Apache DorisBig DataData Governance

0 likes · 14 min read

Exploring JD Logistics’ Billion‑Scale Data Management and Analytics with Apache Doris

TAL Education Technology

Jul 1, 2021 · Big Data

Optimization of A/B Test Metric Computation Using Spark and ClickHouse

This article details the design and multi‑stage optimization of an A/B testing metric system, describing its product architecture, Spark‑based computation engine, ClickHouse OLAP layer, cumulative calculation improvements, and batch processing techniques that reduced processing time from hours to a few minutes for hundreds of experiments and metrics.

A/B testingBig DataClickHouse

0 likes · 8 min read

Optimization of A/B Test Metric Computation Using Spark and ClickHouse

Python Crawling & Data Mining

Jun 27, 2021 · Artificial Intelligence

How SQLFlow Turns Simple SQL Queries into Powerful AI Models

SQLFlow is an open‑source platform that lets users build and run machine‑learning and deep‑learning models directly from SQL statements, lowering the barrier for business analysts to apply AI by abstracting complex pipelines into familiar database queries.

Deep LearningSQLSQLFlow

0 likes · 8 min read

How SQLFlow Turns Simple SQL Queries into Powerful AI Models

Architects Research Society

Jun 26, 2021 · Big Data

Comprehensive Overview of Over 50 Big Data Terms and Technologies

This article presents an extensive glossary of more than fifty big‑data concepts—including Apache projects, data‑analysis methods, storage formats, AI‑related terms, and emerging metrics—providing concise English explanations for each term.

Apache HadoopBig DataData Analytics

0 likes · 17 min read

Comprehensive Overview of Over 50 Big Data Terms and Technologies

Zhongtong Tech

May 31, 2021 · Big Data

How Zhongtong Express Built a Robust Big Data Quality Assurance System

At the 2021 QECon conference in Shenzhen, Zhongtong Express senior architect Wu Da detailed the design and evolution of their big data quality assurance framework, covering six key layers and highlighting future trends in predictive analytics and deep business integration.

data engineeringquality assurance

0 likes · 4 min read

How Zhongtong Express Built a Robust Big Data Quality Assurance System

ITFLY8 Architecture Home

May 26, 2021 · Databases

How to Store Billions of IDs in Redis Without Running Out of Memory

This article examines the challenges of storing massive DMP ID mappings in Redis—including memory fragmentation, expansion, and latency constraints—and presents eviction, bucket‑hashing, and fragmentation‑reduction techniques to achieve efficient, real‑time, large‑scale key‑value storage.

Key-value hashingMemory Optimizationdata engineering

0 likes · 11 min read

How to Store Billions of IDs in Redis Without Running Out of Memory

DeWu Technology

May 22, 2021 · Big Data

Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries

A unified semantic layer for data development solves metric‑change ripple effects, developer burden, and large‑scale query performance problems by offering consistent metric definitions, multi‑view access, concise auto‑generated SQL, instant propagation of updates, and engine‑driven optimal query selection, thereby bridging business and engineering and cutting maintenance effort.

Big DataOLAPdata engineering

0 likes · 5 min read

Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries

Tencent Cloud Developer

May 18, 2021 · Big Data

Latest ClickHouse Technologies and Practical Applications

ClickHouse, born from Yandex’s Metrica and now a top‑50 open‑source analytics engine, achieves exceptional speed through a vectorized compute engine, column‑store architecture, and an active community, powering real‑time workloads at companies like Tencent Music, Sina, Bilibili, and Suning while introducing features such as column merging, projections, and storage‑compute separation for future scalability.

ClickHouseColumnar DatabaseOLAP

0 likes · 17 min read

Latest ClickHouse Technologies and Practical Applications

Big Data Technology Architecture

May 13, 2021 · Big Data

Real-Time OLAP Evolution and Production Optimization at BTC.com

This article details BTC.com’s journey from a legacy batch‑oriented analytics stack to a modern real‑time OLAP architecture using Flink, ClickHouse, Kafka, and Kubernetes, highlighting the business drivers, technical choices, architectural evolution, optimizations, and future directions.

BlockchainClickHouseFlink

0 likes · 9 min read

Real-Time OLAP Evolution and Production Optimization at BTC.com

DataFunTalk

May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi

0 likes · 12 min read

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

Tencent Cloud Developer

Apr 29, 2021 · Industry Insights

Future of Databases & Big Data: Insights from the First Techo TVP Summit

The inaugural Techo TVP Developer Summit in Shenzhen gathered over 500 developers to explore the latest trends in databases, distributed systems, big data, and cloud‑native technologies, offering expert analyses, real‑world case studies, and career guidance for data professionals.

Big DataCloud NativeDistributed Systems

0 likes · 19 min read

Future of Databases & Big Data: Insights from the First Techo TVP Summit

Meituan Technology Team

Apr 15, 2021 · Big Data

Data Governance Practices at Meituan Hotel & Travel Platform

Meituan’s hotel‑travel platform tackled exploding data‑quality, cost, efficiency, and security issues by establishing a full‑link governance framework—standardized processes, a Data Management Committee, and unified “One Model, One Logic, One Service, One Portal” systems—that cut per‑unit costs by ~40%, boosted engineer productivity over 60%, eliminated major security incidents, and set the stage for autonomous, AI‑driven data governance.

Big DataData GovernanceData Quality

0 likes · 32 min read

Data Governance Practices at Meituan Hotel & Travel Platform

iQIYI Technical Product Team

Apr 9, 2021 · Big Data

Real-Time Data Warehouse at iQIYI Video Production Using Spark and ClickHouse

To meet iQIYI video production’s thousands‑QPS, petabyte‑scale, frequently‑updated data and large‑table join requirements, the team built a Spark‑plus‑ClickHouse real‑time warehouse that streams Kafka changes, joins HBase dimensions, and writes to ClickHouse, reducing reporting development time from days to hours while supporting both offline and real‑time analytics.

ClickHouseHBaseKafka

0 likes · 12 min read

Real-Time Data Warehouse at iQIYI Video Production Using Spark and ClickHouse

DataFunTalk

Mar 27, 2021 · Big Data

Kuaishou's HDFS Architecture, Scale, Challenges, and Practices

This article presents an in‑depth technical overview of Kuaishou's massive HDFS deployment, detailing its architecture, petabyte‑scale data and thousands‑of‑node clusters, the key scalability challenges faced, and the custom solutions—including FixedOrder, RBF balancer, observer read, slow‑node mitigation, and tiered protection—implemented to keep the system performant and reliable.

Big DataHDFSKuaishou

0 likes · 12 min read

Kuaishou's HDFS Architecture, Scale, Challenges, and Practices

Top Architect

Feb 27, 2021 · Big Data

Data Platform vs Backend Architecture: Benefits of Moving Functionality to a Data Platform

The article explains why shifting batch jobs, reporting, and machine‑learning model training from traditional backend services to a dedicated data platform can simplify development, improve fault tolerance, and scale analytics, using real‑world examples from Spotify and best‑practice guidelines.

Backend ArchitectureBatch ProcessingCron Jobs

0 likes · 11 min read

Data Platform vs Backend Architecture: Benefits of Moving Functionality to a Data Platform

21CTO

Feb 22, 2021 · Artificial Intelligence

How to Strengthen an Algorithm Engineer’s Real‑World Impact: Tech, Business, and Soft Skills

The article outlines a three‑dimensional framework—technical, business, and soft‑skill competencies—that algorithm engineers need to master in order to successfully deliver machine‑learning solutions in production environments, offering practical advice on data handling, model evaluation, stakeholder communication, and personal development.

business analysisdata engineeringmachine learning

0 likes · 15 min read

How to Strengthen an Algorithm Engineer’s Real‑World Impact: Tech, Business, and Soft Skills

DevOps

Feb 9, 2021 · Operations

Choosing Between DataOps, MLOps, and AIOps: A Guide for Data Teams

The article examines how data teams can select the appropriate Ops framework—DataOps, MLOps, or AIOps—by comparing their origins, principles, responsibilities, and tooling, and stresses that cultural principles outweigh technology choices for efficient delivery of data and machine‑learning products.

DataOpsDevOpsMLOps

0 likes · 12 min read

Choosing Between DataOps, MLOps, and AIOps: A Guide for Data Teams

DataFunTalk

Feb 5, 2021 · Big Data

Design and Implementation of Beike's Data Management Platform (DMP)

This article details how Beike built a comprehensive Data Management Platform (DMP) that integrates user behavior and business data across multiple apps, outlines its five‑layer architecture, discusses data collection, processing, storage, real‑time profiling, and presents performance results and future optimization directions.

Big DataDMPHive

0 likes · 20 min read

Design and Implementation of Beike's Data Management Platform (DMP)

TAL Education Technology

Jan 28, 2021 · Big Data

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

This article, presented by senior data platform engineer Mao Xiangyi of TAL Education, details the design and implementation of the company’s real‑time T‑Streaming platform, covering its three‑layer data architecture, batch‑stream integration techniques, ODS layer real‑timeization, Flink SQL development workflow, hybrid‑cloud deployment, and a case study of K‑12 renewal reporting.

Batch-Stream IntegrationEducation AnalyticsFlink

0 likes · 18 min read

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

Xueersi Online School Tech Team

Jan 15, 2021 · Artificial Intelligence

Recommendation System Architecture and Engineering Overview

This article presents a comprehensive overview of a recommendation system, covering its business background, purpose, detailed engineering architecture—including data sources, computation, storage, online learning, service and access layers—and discusses key challenges, module design, and practical reflections.

AB testingTensorFlowdata engineering

0 likes · 14 min read

Recommendation System Architecture and Engineering Overview

Big Data Technology & Architecture

Jan 13, 2021 · Big Data

My Month-Long Alibaba Mama Interview Experience: Spark, Kafka, and Big Data Technical Rounds

The author recounts a month‑long, four‑round technical interview at Alibaba Mama, detailing phone, on‑site, and HR stages, with deep discussions on Spark, Kafka, Hadoop, platform design, and backend fundamentals, while sharing resource links for big‑data interview preparation.

AlibabaHadoopSpark

0 likes · 7 min read

My Month-Long Alibaba Mama Interview Experience: Spark, Kafka, and Big Data Technical Rounds

Big Data Technology & Architecture

Jan 3, 2021 · Big Data

A Comprehensive Introduction to Apache Airflow: Architecture, Installation, and Usage

This article provides an in‑depth overview of Apache Airflow, covering its core concepts, advantages, architecture components, installation steps, example ETL DAG code, common command‑line tools, and practical tips for leveraging Airflow in data engineering workflows.

AirflowETLPython

0 likes · 13 min read

A Comprehensive Introduction to Apache Airflow: Architecture, Installation, and Usage

DataFunTalk

Nov 28, 2020 · Artificial Intelligence

Building Fast-Iterating Machine Learning Systems at Tubi: A/B Testing, Simple Models, and Embedding Strategies

This article shares Tubi's practical experience in rapidly iterating machine‑learning systems, emphasizing the early importance of simple end‑to‑end A/B testing platforms, clear launch plans, heat‑based and embedding‑based ranking models, and a culture of fast experimentation over complex deep‑learning research.

A/B testingEmbeddingartificial intelligence

0 likes · 8 min read

Building Fast-Iterating Machine Learning Systems at Tubi: A/B Testing, Simple Models, and Embedding Strategies

Alibaba Cloud Developer

Nov 22, 2020 · Big Data

How Flink’s Stream‑Batch Integration Powered Alibaba’s Record‑Breaking Double‑11

Alibaba’s 2020 Double‑11 achieved unprecedented real‑time processing of 4 billion records per second and 7 TB of data per second using Flink, showcasing the stability, performance and efficiency of its stream‑batch unified architecture across diverse business scenarios.

AlibabaBatch ProcessingBig Data

0 likes · 15 min read

How Flink’s Stream‑Batch Integration Powered Alibaba’s Record‑Breaking Double‑11