Tagged articles
283 articles
Page 3 of 3
Beike Product & Technology
Beike Product & Technology
Nov 13, 2020 · Big Data

Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook

The article summarizes Beike's one‑stop big data development platform, describing its data business background, the evolution from a simple Hadoop‑Kafka‑Hive stack to a metadata‑driven, asset‑oriented platform, and outlines current capabilities in data management, integration, scheduling, quality, openness, and future plans.

Big DataData GovernanceData Platform
0 likes · 11 min read
Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook
dbaplus Community
dbaplus Community
Oct 13, 2020 · Big Data

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices

This article explains why real‑time data warehouses are needed, outlines their core principles, compares them with offline warehouses, describes typical use cases such as real‑time OLAP, dashboards, feature generation and monitoring, and provides a step‑by‑step guide to designing, implementing, and operating a Flink‑based streaming warehouse with Kafka, HBase, and metadata management.

FlinkKafkaOLAP
0 likes · 29 min read
How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices
DataFunTalk
DataFunTalk
Oct 7, 2020 · Big Data

Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework

This article outlines the Yanxuan data warehouse’s layered architecture, the offline and real‑time development platforms, the comprehensive standards for metric definition, model design, and SQL development, and proposes a six‑dimensional evaluation system covering data norms, security, quality, stability, continuous improvement, and development efficiency.

Big DataData GovernanceSQL Standards
0 likes · 12 min read
Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework
Youku Technology
Youku Technology
Sep 18, 2020 · Big Data

Digitalization of Youku Long‑Video Content Supply Chain: Practices and Architecture

Youku’s digital content‑supply‑chain system transforms long‑video production by introducing a three‑stage framework—structured evaluation of talent and scripts, information‑driven production management, and a unified demand‑aligned content strategy—that curtails delays, mitigates risk, and saves over 100 million RMB while scaling to billions of data records daily.

Big DataContent Supply ChainDigital Transformation
0 likes · 11 min read
Digitalization of Youku Long‑Video Content Supply Chain: Practices and Architecture
Beike Product & Technology
Beike Product & Technology
Aug 17, 2020 · Big Data

Bitmap-Based User Segmentation in a DMP Platform Using ClickHouse

This article describes how a data management platform (DMP) at Beike leverages ClickHouse bitmap structures and Spark pipelines to generate global numeric user IDs, design tag-specific bitmap rules for enum, continuous, and date attributes, handle boundary cases, and produce high‑performance bitmap SQL for real‑time user group estimation and complex segment logic.

Big DataBitmapClickHouse
0 likes · 17 min read
Bitmap-Based User Segmentation in a DMP Platform Using ClickHouse
Big Data Technology Architecture
Big Data Technology Architecture
Aug 12, 2020 · Big Data

Overview of New Features and Improvements in Apache Spark 3.0

Apache Spark 3.0 introduces a suite of performance enhancements, richer APIs, improved monitoring, SQL compatibility, new data sources, and ecosystem extensions, including Adaptive Query Execution, Dynamic Partition Pruning, Join Hints, pandas UDF improvements, and accelerator‑aware scheduling, to boost scalability and ease of use for big‑data workloads.

Adaptive Query ExecutionApache SparkPerformance Optimization
0 likes · 15 min read
Overview of New Features and Improvements in Apache Spark 3.0
Qunar Tech Salon
Qunar Tech Salon
Jul 15, 2020 · Artificial Intelligence

Qunar Technology Carnival: Interviews on Search Optimization, AIOps Fault Localization, and Revenue Management

The Qunar Technology Carnival features in‑depth interviews with experts Wang Mingyou, He Yang, and Jia Ziyan who share practical experiences on search ranking improvements, AIOps‑driven fault localization, and data‑driven revenue management, highlighting challenges, solutions, and future directions in AI‑powered systems.

QunarRevenue ManagementTech Interview
0 likes · 10 min read
Qunar Technology Carnival: Interviews on Search Optimization, AIOps Fault Localization, and Revenue Management
Xianyu Technology
Xianyu Technology
Jul 9, 2020 · Product Management

Xianyu Product Structuring: Evolution, Current Strategies, and Future Directions

Xianyu’s product‑information structuring has progressed from simple text mining to multimodal AI pipelines that now boost coverage by nearly 50 %, while facing precision and engineering hurdles, and it plans to adopt a standardized VID attribute system, plug‑in multimodal models, and rule‑based input assistance to enable seamless, photo‑driven publishing.

Multimodal AIdata engineeringe‑commerce
0 likes · 10 min read
Xianyu Product Structuring: Evolution, Current Strategies, and Future Directions
Big Data Technology Architecture
Big Data Technology Architecture
Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData QualityFlink
0 likes · 24 min read
Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink
DataFunTalk
DataFunTalk
Jun 14, 2020 · Big Data

Designing an Offline Big Data Processing Architecture Based on Object Storage

This article presents a comprehensive offline big‑data processing framework that leverages scalable object storage for PB‑level data, details storage and compute engine requirements, compares cost options, describes data pipeline design, and showcases an e‑commerce case study with Spark‑driven analytics.

Big DataCost OptimizationSpark
0 likes · 19 min read
Designing an Offline Big Data Processing Architecture Based on Object Storage
Beike Product & Technology
Beike Product & Technology
Jun 12, 2020 · Big Data

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

This article describes the evolution of a real‑time computing platform from SQL 1.0 built on Spark Structured Streaming to SQL 2.0 powered by Flink‑SQL, covering dynamic tables, continuous queries, dimension‑table joins, cache optimization, DDL extensions, platformization, operational challenges and future roadmap.

Big DataDimension TableFlink
0 likes · 19 min read
Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 12, 2020 · Artificial Intelligence

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

Deepthought is iQIYI’s end‑to‑end machine‑learning platform that unifies distributed frameworks, decouples pipeline stages, integrates with Tongtian Tower, and offers visual drag‑and‑drop configuration, evolving from a fraud‑detection prototype to a generic system with real‑time inference, automated hyper‑parameter optimization, and support for large‑scale data across anti‑fraud, recommendation, and analytics workloads.

AI PlatformAutoMLParameter Server
0 likes · 13 min read
Deepthought: An End‑to‑End Machine Learning Platform at iQIYI
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jun 2, 2020 · Artificial Intelligence

How to Transition into AI: Real Stories, Tools, and a Practical Roadmap

This article shares personal journeys of five Huawei AI experts, recommends essential AI books, walks through setting up PyCharm with ModelArts for hands‑on model training, and outlines a three‑stage AI career roadmap—from practical coding to mastering principles and deploying inference—offering actionable guidance for anyone looking to break into artificial intelligence.

AI career transitionAI deploymentModelArts
0 likes · 34 min read
How to Transition into AI: Real Stories, Tools, and a Practical Roadmap
dbaplus Community
dbaplus Community
Apr 26, 2020 · Big Data

Evolving from Data Warehouses to Data Middle Platforms: Architecture & Practices

This talk reviews China's big‑data evolution from early enterprise data warehouses to modern data middle platforms, outlines core architectural components, technology selections, data development practices, lifecycle and quality management, and shares practical Q&A insights for building scalable, cost‑effective data infrastructures.

Big DataData ArchitectureData Governance
0 likes · 28 min read
Evolving from Data Warehouses to Data Middle Platforms: Architecture & Practices
DataFunTalk
DataFunTalk
Apr 22, 2020 · Big Data

Didi's Real-Time Computing Practices with Apache Flink: Architecture, StreamSQL, and Operational Insights

Senior Didi technology expert Liang Li-yin shares how Didi leverages Apache Flink for large‑scale real‑time computing, covering service architecture, StreamSQL advantages, multi‑cluster management, task control, monitoring, meta‑store integration, challenges, and future plans such as high availability, real‑time ML, and unified batch‑stream processing.

Apache FlinkBig DataReal‑Time Computing
0 likes · 14 min read
Didi's Real-Time Computing Practices with Apache Flink: Architecture, StreamSQL, and Operational Insights
Qudian (formerly Qufenqi) Technology Team
Qudian (formerly Qufenqi) Technology Team
Mar 4, 2020 · Artificial Intelligence

How Intelligent Marketing Leverages AI and Big Data to Boost Conversion Rates

This article explains how intelligent marketing transforms traditional, labor‑intensive strategies into data‑driven, AI‑powered systems by detailing the multi‑layer architecture, data pipelines, machine‑learning models such as LR and GBDT+LR, and future directions like personalized copy generation and deep‑learning enhancements.

AIMarketingdata engineering
0 likes · 8 min read
How Intelligent Marketing Leverages AI and Big Data to Boost Conversion Rates
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 3, 2020 · Artificial Intelligence

How Alibaba Turns AI, Deep Learning, and Big Data into Enterprise Power

Jia Yangqing’s talk from the Alibaba CIO Academy explains what artificial intelligence is, its applications, the challenges of perception and decision making, the evolution of deep‑learning models, the need for massive compute power, and how enterprises can strategically adopt AI and big‑data technologies to drive innovation.

AI Platformscloud computingdata engineering
0 likes · 16 min read
How Alibaba Turns AI, Deep Learning, and Big Data into Enterprise Power
dbaplus Community
dbaplus Community
Jan 14, 2020 · Big Data

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

This article details{32-64 words} OPPO's evolution from an offline data warehouse to a real‑time platform, describing the business scale, data‑mid platform architecture, migration strategy using Flink SQL, extensions like AthenaX, and practical use cases such as real‑time ETL, CTR calculation, and tag import.

ETLFlinkSQL
0 likes · 18 min read
How OPPO Built a Real‑Time Data Warehouse with Flink SQL
Bitu Technology
Bitu Technology
Dec 20, 2019 · Big Data

Building a Model‑Driven Data Platform at Tubi: From Data Warehouse to Automated Machine Learning

The article describes how Tubi, North America’s largest free‑streaming service, built a model‑driven data platform using a high‑quality data warehouse, DBT‑based transformations, Kubernetes‑hosted JupyterHub, low‑latency Scala/Akka services, and automated machine‑learning pipelines to accelerate experimentation and decision‑making.

Data Platformdata engineeringdbt
0 likes · 11 min read
Building a Model‑Driven Data Platform at Tubi: From Data Warehouse to Automated Machine Learning
Youzan Coder
Youzan Coder
Nov 20, 2019 · Big Data

Understanding Youzan's Data Middle Platform: Architecture, Challenges, and Construction

He Fei explains how Youzan built a two‑layer data middle platform—combining a technology stack of offline, online and streaming components with an asset layer for cataloguing, quality, lineage and unified APIs—to tackle diverse business demands, technical complexity, and to enable cost‑optimized, reusable real‑time data services.

Data Platformdata engineering
0 likes · 15 min read
Understanding Youzan's Data Middle Platform: Architecture, Challenges, and Construction
FunTester
FunTester
Oct 22, 2019 · Backend Development

How to Scrape 7.2 Million Historical Weather Records with Groovy

This article explains how to use a Groovy script to crawl over 7 million historical weather entries for 3,200 cities spanning 2011‑2019, process the JSON responses, and store the cleaned data into a MySQL table, while sharing practical tips and code snippets.

GroovyJavaWeather Data
0 likes · 7 min read
How to Scrape 7.2 Million Historical Weather Records with Groovy
iQIYI Technical Product Team
iQIYI Technical Product Team
Oct 11, 2019 · Artificial Intelligence

Insights into iQIYI's Recommendation Platform Architecture and Practices

iQIYI’s recommendation middle‑platform consolidates content, behavior, and machine‑learning services into a modular architecture that lets any front‑end business connect to a unified recommendation engine, cutting integration time from weeks to days, boosting development efficiency by over 30 % while simplifying maintenance and future upgrades.

AITech Platformdata engineering
0 likes · 11 min read
Insights into iQIYI's Recommendation Platform Architecture and Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 3, 2019 · Big Data

Data Development Interview Tips and Career Guidance

This article offers practical advice for data development job interviews, explaining why Java is essential, comparing Java and Python, outlining required backend framework knowledge, discussing the role of SQL and data warehousing, and addressing work‑life concerns such as overtime and company size choices.

Big DataJavaPython
0 likes · 4 min read
Data Development Interview Tips and Career Guidance
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataResources
0 likes · 10 min read
Comprehensive Collection of Apache Flink Learning Resources
Meituan Technology Team
Meituan Technology Team
Aug 15, 2019 · Big Data

Inconsistent Predictions in XGBoost on Spark Due to Different Missing Value Handling

The discrepancy between XGBoost’s Java engine and Spark arose because XGBoost4j treats zero as the default missing value while Spark’s sparse vectors use NaN, causing inconsistent predictions, and was resolved by explicitly setting Float.NaN as the missing value or converting sparse vectors to dense so both engines handle zeros uniformly.

SparkSparseVectorXGBoost
0 likes · 13 min read
Inconsistent Predictions in XGBoost on Spark Due to Different Missing Value Handling
DataFunTalk
DataFunTalk
Aug 15, 2019 · Artificial Intelligence

Intelligent Customer Acquisition System Practice at Du Xiaoman Financial

This article presents a comprehensive overview of Du Xiaoman Financial's intelligent customer acquisition system, covering acquisition channels, efficiency improvements through multi‑stage models, data understanding with deepFM, the platform architecture, and related recruitment for senior machine‑learning engineers.

AICustomer Acquisitiondata engineering
0 likes · 9 min read
Intelligent Customer Acquisition System Practice at Du Xiaoman Financial
DataFunTalk
DataFunTalk
Aug 14, 2019 · Artificial Intelligence

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

The article explores how the rapid growth of the internet has created information overload, discusses the challenges of recommendation systems such as sparsity and timeliness, outlines a four‑step personalized content pipeline, and highlights the interdisciplinary nature of building effective AI‑driven recommendation solutions.

AIBig DataRecommendation Systems
0 likes · 16 min read
Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions
21CTO
21CTO
Aug 6, 2019 · Databases

Why SQL Is Making a Comeback: From NoSQL’s Rise to the New Data Era

This article explores the resurgence of SQL, tracing its historical roots, the rise and limitations of NoSQL, and how modern cloud and NewSQL solutions are re‑establishing SQL as the universal interface for data storage, processing, and analysis.

NewSQLNoSQLSQL
0 likes · 14 min read
Why SQL Is Making a Comeback: From NoSQL’s Rise to the New Data Era
dbaplus Community
dbaplus Community
Jul 24, 2019 · Big Data

Essential Open-Source Tools Every Big Data Engineer Should Know

This article compiles a comprehensive list of common open‑source tools for big data platforms—covering programming languages, data collection, ETL, storage, analysis, query, management, and monitoring—to help learners and practitioners quickly locate and understand the technologies they need.

Big DataETLHadoop
0 likes · 15 min read
Essential Open-Source Tools Every Big Data Engineer Should Know
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 20, 2019 · Big Data

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.

Apache FlinkBig DataFlink SQL
0 likes · 27 min read
Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example
Architects' Tech Alliance
Architects' Tech Alliance
Apr 20, 2019 · Industry Insights

Why Data Middle Platforms Are the New Production Lines for Data Products

The article examines how data middle platforms transform raw, fragmented enterprise data into valuable data products through a supply‑chain approach, outlining their origins, core processes, deep‑processing techniques, and the essential capabilities needed for successful implementation.

Data PlatformData ProductData Supply Chain
0 likes · 13 min read
Why Data Middle Platforms Are the New Production Lines for Data Products
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 15, 2019 · Big Data

Big Data Mastery Roadmap

This article outlines a comprehensive series of over 500 planned tutorials covering Java advanced features, distributed theory, Hadoop, Spark, Flink, and various big‑data storage and processing technologies, designed to guide engineers transitioning into big‑data development from fundamentals to expert level.

Distributed SystemsFlinkHadoop
0 likes · 4 min read
Big Data Mastery Roadmap
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 12, 2019 · Big Data

Big Data Mastery Roadmap – Series Overview

An extensive roadmap series titled “Big Data Mastery Roadmap” outlines essential topics—from Java advanced features and JVM internals to Hadoop, Spark, Flink, and big-data algorithms—guiding engineers transitioning to big data development with curated references, updates, and author insights.

Distributed SystemsLearning Pathdata engineering
0 likes · 5 min read
Big Data Mastery Roadmap – Series Overview
Efficient Ops
Efficient Ops
Jan 3, 2019 · Operations

Building a Scalable AIOps Platform from Zero: A Guide for Small Teams

This article outlines how to design and implement a large‑scale, AI‑driven operations platform—from defining goals with the 5W‑1H method, through data collection, storage, and processing, to building the three‑horse‑power components of monitoring, alerting, and CI/CD—targeted especially at small‑to‑mid‑size enterprises.

aiopsartificial intelligencedata engineering
0 likes · 17 min read
Building a Scalable AIOps Platform from Zero: A Guide for Small Teams
21CTO
21CTO
Dec 21, 2018 · Big Data

What Does a Data Engineer Do? Skills, Certifications, and Career Path

This article explains the role of a data engineer, outlines essential big‑data architecture tools, key technical skills, differences from data scientists, and offers guidance on certifications and learning paths to launch a successful data‑engineering career.

Skillscertificationsdata engineering
0 likes · 7 min read
What Does a Data Engineer Do? Skills, Certifications, and Career Path
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 21, 2018 · Big Data

Design and Implementation of a Scalable User Profiling System Using Hive and SQL Templates

To meet the growing demands of precise, cost‑effective user operations, the article outlines a lightweight, flexible profiling system built on Hive that uses SQL templates, custom UDFs, and set‑operation logic to enable attribute‑based user segmentation, batch processing, and seamless integration with downstream services.

SQL templatesSet Operationsdata engineering
0 likes · 11 min read
Design and Implementation of a Scalable User Profiling System Using Hive and SQL Templates
DataFunTalk
DataFunTalk
Nov 24, 2018 · Big Data

The Evolution of iQIYI's Big Data Analytics Platform

This article chronicles iQIYI’s journey from a simple Hive‑based data pipeline to the sophisticated, multi‑engine “Tongtian Tower” platform, detailing the development of the Magic Mirror system, the Gear workflow manager, BabelBD, the Monet visual analytics tool, and the integrated BI ecosystem that now supports billions of daily users.

BIBig Datadata engineering
0 likes · 18 min read
The Evolution of iQIYI's Big Data Analytics Platform
DataFunTalk
DataFunTalk
Nov 21, 2018 · Artificial Intelligence

Personalized Recommendation System of 51 Credit Card: Architecture, Challenges, and Growth Cases

This article details how 51 Credit Card leverages artificial intelligence to build a personalized recommendation system, covering business pain points, technical challenges, a three‑layer tagging architecture from bill and app data, model deployment pipelines, and real‑world growth case studies that boosted conversion and ROI.

AIdata engineeringfinance
0 likes · 14 min read
Personalized Recommendation System of 51 Credit Card: Architecture, Challenges, and Growth Cases
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 20, 2018 · Big Data

A Decade of Alibaba's Big Data Platform Evolution Through Double 11

The article chronicles Alibaba's ten‑year journey of building and scaling its big data platform—from early Oracle clusters and Hadoop‑based Cloud‑Ladder 1 to the self‑developed ODPS/MaxCompute, real‑time Blink engine, and the unified DataWorks ecosystem—highlighting key technical milestones, performance breakthroughs, and operational challenges that powered successive Double 11 shopping festivals.

AlibabaData PlatformMaxCompute
0 likes · 22 min read
A Decade of Alibaba's Big Data Platform Evolution Through Double 11
Programmer DD
Programmer DD
Nov 7, 2018 · Big Data

Choosing the Right SQL Engine for Big Data: A Practical Guide

This article explores various SQL engines and storage options for big‑data workloads, compares their performance and capabilities, shows practical code examples, and offers guidance on writing efficient SQL in complex data environments.

Big DataHiveSQL
0 likes · 6 min read
Choosing the Right SQL Engine for Big Data: A Practical Guide
Architect's Tech Stack
Architect's Tech Stack
Oct 23, 2018 · Fundamentals

Common Data Collection Challenges in Startups and Practical Solutions

The article examines three typical data collection problems faced by startups—unclear collection methods, chaotic tracking points, and poor collaboration between data and engineering teams—and offers practical strategies such as adopting full‑event models, appointing data architects, and securing top‑down support to achieve reliable, comprehensive analytics.

AnalyticsData Governancedata collection
0 likes · 10 min read
Common Data Collection Challenges in Startups and Practical Solutions
Meituan Technology Team
Meituan Technology Team
Oct 18, 2018 · Big Data

Building a Real-Time Data Warehouse with Flink at Meituan

Meituan replaced its Storm‑based pipeline with a four‑layer real‑time data warehouse powered by Flink, using hybrid storage (Cellar KV, Elasticsearch, Druid, MySQL) to deliver low‑latency, high‑throughput services, dramatically simplifying SQL‑driven development, unifying metrics, cutting compute costs, and paving the way for offline‑grade accuracy and reliability.

FlinkMeituanStreaming
0 likes · 16 min read
Building a Real-Time Data Warehouse with Flink at Meituan
Beike Product & Technology
Beike Product & Technology
Sep 28, 2018 · Databases

Using ClickHouse for Large‑Scale User Behavior Analysis at Beike Zhaofang

This article details how Beike Zhaofang leveraged the ClickHouse columnar OLAP database for large‑scale user behavior analysis, covering its architecture, key features, performance benchmarks against other engines, data ingestion pipelines, custom UDFs for funnel and retention metrics, deployment setup, and future enhancements.

ClickHouseFunnel AnalysisOLAP
0 likes · 13 min read
Using ClickHouse for Large‑Scale User Behavior Analysis at Beike Zhaofang
DataFunTalk
DataFunTalk
Sep 2, 2018 · Artificial Intelligence

From Zero to One: Building and Deploying Knowledge Graphs at Beike Real Estate

This article details the evolution, architecture, and practical applications of knowledge graphs at Beike Real Estate, covering their historical background, five‑view advantages, data pipelines, ontology construction, intelligent search, recommendation, and chatbot integration, while also discussing challenges and future directions.

Intelligent AssistantKnowledge GraphNLP
0 likes · 13 min read
From Zero to One: Building and Deploying Knowledge Graphs at Beike Real Estate
Meitu Technology
Meitu Technology
Aug 14, 2018 · Big Data

Meitu Data Platform Architecture and Practices

Meitu’s data platform, serving dozens of apps with 500 million monthly active users and billions of daily events, combines the Arachnia log‑collection system, Kafka ingestion, multi‑layer storage (HDFS, MongoDB, HBase, Elasticsearch), offline Hive/MapReduce processing and real‑time Storm/Flink/Naix pipelines, supported by data‑workshop tools, staged evolution for scalability, and robust security and query‑validation mechanisms.

Big DataData PlatformETL
0 likes · 16 min read
Meitu Data Platform Architecture and Practices
Meitu Technology
Meitu Technology
Aug 11, 2018 · Big Data

Meitu Technology Salon: Evolution of the Big Data Platform, Distributed Bitmap (Naix), and Apache Kylin

At Meitu’s Technology Salon, senior big‑data experts detailed the end‑to‑end architecture and stability measures of Meitu’s large‑scale data platform, introduced the high‑performance distributed bitmap solution Naix, showcased the evolution of Meizu’s user‑insight system, and highlighted Apache Kylin’s OLAP capabilities and Superset integration for scalable, real‑time analytics.

Apache KylinBig DataData Analytics
0 likes · 9 min read
Meitu Technology Salon: Evolution of the Big Data Platform, Distributed Bitmap (Naix), and Apache Kylin
Architecture Digest
Architecture Digest
Jul 6, 2018 · Backend Development

Essential Backend Infrastructure and Services for Java Applications

This article outlines the fundamental backend components, frameworks, and services—including API gateways, authentication centers, configuration management, service governance, scheduling, logging, data pipelines, and monitoring—required to build robust, scalable Java business applications for both online and internal use.

BackendJavaMicroservices
0 likes · 20 min read
Essential Backend Infrastructure and Services for Java Applications
AntTech
AntTech
Apr 28, 2018 · Artificial Intelligence

The Future of AI: Intelligent Infrastructure, Engineering Challenges, and the Limits of Current Approaches

The article examines Michael I. Jordan’s critique of current AI research, highlights the need for intelligent infrastructure that integrates computing, data, and physical systems across domains, and uses real‑world examples to argue for a new engineering discipline beyond narrow deep‑learning hype.

AIIntelligent Infrastructuredata engineering
0 likes · 23 min read
The Future of AI: Intelligent Infrastructure, Engineering Challenges, and the Limits of Current Approaches
Meituan Technology Team
Meituan Technology Team
Jan 26, 2018 · Big Data

Design and Implementation of a Real-Time Data Processing System at Meituan

Meituan designed a Storm‑based real‑time data processing platform that guarantees at‑least‑once delivery and high availability, employs a custom spout, regression‑driven traffic smoothing, and a low‑latency KV store with atomic operations, persisting results in Kafka, MySQL and Cellar to power merchant dashboards and heat‑tag analytics, while planning broader real‑time analytics expansion.

Big DataDistributed SystemsStorm
0 likes · 10 min read
Design and Implementation of a Real-Time Data Processing System at Meituan
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Jan 5, 2018 · Big Data

What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions

Analyzing 2017’s big data boom, the article explores how the 4V characteristics—volume, variety, velocity, and value—spurred innovations like distributed storage, NoSQL, real‑time stream processing, and AI integration, and predicts future hotspots such as SQL resurgence, cloud‑based platforms, and AI‑driven analytics.

Big DataReal-time Processingartificial intelligence
0 likes · 11 min read
What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 23, 2017 · Big Data

How Alibaba Built Its Full‑Domain Data Platform: Architecture & Lessons

In this detailed account, Alibaba senior technologist Zhang Lei explains the concept of a full‑domain data platform, its “four‑horizontal and three‑vertical” architecture, the OneData ecosystem, cost‑saving strategies, data quality tools, and practical challenges of building and operating massive big‑data infrastructure across the Alibaba ecosystem.

AlibabaData Middle Platformdata engineering
0 likes · 15 min read
How Alibaba Built Its Full‑Domain Data Platform: Architecture & Lessons
ITPUB
ITPUB
Sep 29, 2017 · Big Data

Designing an Open ETL System: Baidu Waimai’s Scalable Data Pipeline Practices

In this talk, a Baidu Waimai engineer explains the motivations, requirements, and architectural choices behind their open‑source ETL platform, covering data flow patterns, logical mappings, storage options, scheduling, metadata management, and quality monitoring to achieve scalable, transparent, and explainable data delivery.

Big DataETLScheduling
0 likes · 26 min read
Designing an Open ETL System: Baidu Waimai’s Scalable Data Pipeline Practices
Architects Research Society
Architects Research Society
Jul 23, 2017 · Big Data

Overview of 50 Additional Big Data Terms and Apache Projects

This article provides an extensive English overview of fifty additional big-data terminology and Apache open-source projects, explaining concepts such as Kafka, Hive, Spark, data cleaning, AI, graph databases, and many other tools and techniques essential for modern data engineering and analytics.

Data Analyticsdata engineering
0 likes · 20 min read
Overview of 50 Additional Big Data Terms and Apache Projects
21CTO
21CTO
Jul 17, 2017 · Artificial Intelligence

Inside 58.com’s Smart Recommendation Engine: Architecture, Algorithms, Data

58.com’s intelligent recommendation system, evolving from a C++ monolith in 2014 to a Java-based micro‑service platform, integrates multi‑layer data processing, diverse recall and ranking algorithms, and a robust microservice architecture to deliver personalized listings across housing, jobs, cars, and more.

Microservicesdata engineeringranking
0 likes · 27 min read
Inside 58.com’s Smart Recommendation Engine: Architecture, Algorithms, Data
21CTO
21CTO
Jul 7, 2017 · Big Data

How to Kickstart Your Big Data Career: A Complete Learning Roadmap

This guide walks beginners through the vast big data landscape, helping them choose the right role, understand essential terminology, plan a learning path, and access curated resources for becoming a data engineer or analyst, all illustrated with clear diagrams.

Big DataLearning Pathbig data technologies
0 likes · 16 min read
How to Kickstart Your Big Data Career: A Complete Learning Roadmap
21CTO
21CTO
Jun 9, 2017 · Big Data

From Hadoop to Spark: A Complete Roadmap to Becoming a Big Data Architect

This guide walks beginners through the essential big‑data ecosystem—from understanding Hadoop’s core components and mastering MapReduce, to using Hive, SparkSQL, Kafka, and real‑time frameworks like Storm, while also covering data ingestion, export, scheduling, and introductory machine‑learning techniques.

Big DataHiveSpark
0 likes · 20 min read
From Hadoop to Spark: A Complete Roadmap to Becoming a Big Data Architect
MaGe Linux Operations
MaGe Linux Operations
May 31, 2017 · Big Data

Essential Skills for a Successful Data Career: From Big Data Platforms to AI

This article outlines the critical competencies needed across the data field—from building and maintaining big data platforms and data warehouses to mastering visualization, analysis, mining, and deep learning—offering practical guidance for aspiring data professionals seeking long‑term career growth.

Data ScienceData Warehousecareer guide
0 likes · 15 min read
Essential Skills for a Successful Data Career: From Big Data Platforms to AI
Qunar Tech Salon
Qunar Tech Salon
Mar 12, 2017 · Big Data

Essential Skills and Career Paths for Data Professionals: From Big Data Platforms to AI

The article outlines the key competencies, responsibilities, and career development advice for data professionals across the entire data stack—from building big‑data platforms and data warehouses to visualization, analysis, algorithm engineering, and deep‑learning applications—emphasizing the importance of creating business value with data.

Big DataData AnalystData Warehouse
0 likes · 15 min read
Essential Skills and Career Paths for Data Professionals: From Big Data Platforms to AI
Ctrip Technology
Ctrip Technology
Mar 8, 2017 · Big Data

Essential Skills and Career Path for Data Professionals: From Big Data Platforms to AI Applications

This article outlines the key competencies and career roadmap for data professionals, covering big‑data infrastructure, data‑warehouse engineering, visualization, analysis, algorithmic mining, and deep‑learning, while emphasizing the importance of business sense, cloud adoption, and continuous learning.

Data WarehouseData visualizationcareer advice
0 likes · 15 min read
Essential Skills and Career Path for Data Professionals: From Big Data Platforms to AI Applications
Architecture Digest
Architecture Digest
Dec 26, 2016 · Big Data

My Journey into Big Data: From Early Mistakes to the Lambda Architecture

The article recounts the author’s early encounters with big‑data challenges, the shift from relational to NoSQL systems, the development of an immutable‑data batch architecture, and the eventual formulation of the Lambda Architecture, illustrating how simplicity and fault‑tolerance can replace complex incremental designs.

Immutable DataLambda architecturedata engineering
0 likes · 9 min read
My Journey into Big Data: From Early Mistakes to the Lambda Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 8, 2016 · Big Data

How Alibaba’s Double 11 Media Dashboard Leverages Cutting‑Edge Data Technology

Alibaba senior technologist Luo Jinpeng explains how the company’s Double 11 media big‑screen platform evolved from chaotic early implementations to a robust, real‑time data system, detailing architectural redesigns, model refactoring, resource scheduling, and link reliability strategies that underpin its massive e‑commerce event.

Alibabadata engineering
0 likes · 2 min read
How Alibaba’s Double 11 Media Dashboard Leverages Cutting‑Edge Data Technology
ITPUB
ITPUB
Aug 15, 2016 · Big Data

5 Commandments to Bridge the Gap Between Data Scientists and Engineers

This article outlines five practical commandments that help data scientists and data engineers collaborate more effectively, covering data awareness, tool familiarity, technical limits, mutual respect, and shared responsibility to ensure smooth project delivery.

CollaborationData Sciencebest practices
0 likes · 9 min read
5 Commandments to Bridge the Gap Between Data Scientists and Engineers
Architecture Digest
Architecture Digest
Aug 15, 2016 · Big Data

Understanding Data: Types, Systems, and Big Data Technologies

This article explains what data is, classifies it into structured, semi‑structured and unstructured forms, describes data mining, databases, data warehouses, the full data lifecycle, and surveys the big‑data ecosystem including storage, batch and real‑time processing, resource scheduling, and visualization technologies.

Lambda architecturedata engineeringdata mining
0 likes · 22 min read
Understanding Data: Types, Systems, and Big Data Technologies
Liulishuo Tech Team
Liulishuo Tech Team
Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowBig Data Architecture
0 likes · 11 min read
Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design
21CTO
21CTO
Apr 8, 2016 · Cloud Computing

Inside Airbnb’s AWS Cloud Architecture and Data Stack

Airbnb’s engineering VP Mike Curtis explains how the company leverages Amazon Web Services, a Hadoop‑based big‑data platform, and custom tools like Aerosolve, Airflow, and Airpal to power its global marketplace, enabling rapid scaling, dynamic pricing, and personalized search through extensive cloud infrastructure and machine‑learning pipelines.

AWScloud architecturedata engineering
0 likes · 15 min read
Inside Airbnb’s AWS Cloud Architecture and Data Stack
Java High-Performance Architecture
Java High-Performance Architecture
Mar 13, 2016 · Big Data

Essential Big Data Skill Map: Tools, Languages, and Techniques You Need

Explore a comprehensive big data skill map covering processing frameworks like Spark and Hadoop, databases, programming languages, analytics tools, visualization libraries, AI techniques, algorithms, data structures, and cloud computing services, providing a practical reference for building expertise in modern data engineering.

cloud computingdata engineering
0 likes · 3 min read
Essential Big Data Skill Map: Tools, Languages, and Techniques You Need
Architecture Digest
Architecture Digest
Feb 22, 2016 · Big Data

Building High‑Performance Big Data Analytics Systems: Techniques and Best Practices

An in‑depth guide outlines technology‑agnostic best‑practice techniques for building high‑performance big data analytics systems, covering data acquisition, storage, processing, visualization, and security, and explains how to address the five V’s of big data to meet demanding operational and performance requirements.

AnalyticsBig Datadata engineering
0 likes · 20 min read
Building High‑Performance Big Data Analytics Systems: Techniques and Best Practices
21CTO
21CTO
Nov 20, 2015 · Artificial Intelligence

How Meituan Builds and Optimizes Its Recommendation System

This article explains Meituan's end‑to‑end recommendation system architecture, data processing pipeline, candidate generation strategies, model training and online ranking techniques, illustrating how data, algorithms, and real‑time signals are combined to improve relevance and conversion.

AIMeituandata engineering
0 likes · 19 min read
How Meituan Builds and Optimizes Its Recommendation System
21CTO
21CTO
Nov 4, 2015 · Big Data

Evolution of Dazhong Dianping’s Data Platform (2012‑2014): Key Lessons for Growing Big Data Teams

This article chronicles the step‑by‑step evolution of Dazhong Dianping’s data platform from 2012 to 2014, detailing changes in data models, storage and compute architecture, scheduling, monitoring, and data‑driven applications, offering practical insights for teams building early‑stage big‑data infrastructures.

Big Data ArchitectureData PlatformData Warehouse
0 likes · 7 min read
Evolution of Dazhong Dianping’s Data Platform (2012‑2014): Key Lessons for Growing Big Data Teams
MaGe Linux Operations
MaGe Linux Operations
Aug 20, 2015 · Big Data

15 Must‑Try Resources to Master Hadoop Quickly

This article explains what Hadoop is, outlines its key features, and presents a curated list of 15 high‑quality tutorials, video courses, and books to help beginners and professionals efficiently learn Hadoop and its MapReduce ecosystem.

HadoopLearning ResourcesMapReduce
0 likes · 12 min read
15 Must‑Try Resources to Master Hadoop Quickly