Tagged articles

3675 articles

Page 20 of 37

Oct 21, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing two coordinator high‑availability solutions, and explaining the cross‑cluster scheduling architecture that leverages idle Presto resources to improve overall big‑data processing efficiency.

Big DataCloud ComputingCross-Cluster Scheduling

0 likes · 16 min read

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

dbaplus Community

Oct 20, 2021 · Big Data

How JD Achieves ClickHouse High‑Availability for Billion‑Scale OLAP

JD's OLAP platform runs on ClickHouse and Doris across 3,000 servers, handling billions of daily queries and petabytes of data, and this article details the selection criteria, cluster deployment models, high‑availability architecture, operational challenges, and future roadmap.

Big DataClickHouseCluster Deployment

0 likes · 21 min read

How JD Achieves ClickHouse High‑Availability for Billion‑Scale OLAP

21CTO

Oct 18, 2021 · Operations

What Emerging IT Roles Will Shape the Future of Tech?

The article surveys rapidly growing IT positions—from quantum computing engineers and security‑compliance managers to big‑data, analytics, and DataOps engineers—explaining how these roles combine advanced technologies, regulatory expertise, and operational practices to drive business transformation and meet the evolving demands of digital enterprises.

Big DataCloudOpsDataOps

0 likes · 9 min read

What Emerging IT Roles Will Shape the Future of Tech?

DataFunTalk

Oct 18, 2021 · Big Data

Building an Intelligent Data Warehouse at Yixin Group: A Big Data Platform Case Study

The article describes how Yixin Group’s product team created an in‑house intelligent data warehouse using Hadoop, Flink/Spark, and standardized data services to transform scattered automotive‑finance data into a secure, scalable platform that supports real‑time analytics and drives business growth.

Big DataFlinkHadoop

0 likes · 10 min read

Building an Intelligent Data Warehouse at Yixin Group: A Big Data Platform Case Study

Java High-Performance Architecture

Oct 17, 2021 · Backend Development

How to Choose the Right Tech Stack: Lessons from a Java Backend Veteran

The author, a seasoned Java backend developer, shares personal experiences and insights on evaluating efficiency, ecosystem, and team dynamics when selecting technologies—from legacy frameworks and databases to modern big‑data tools like Spark and Flink—offering practical guidance for developers and teams navigating today’s rapidly evolving tech landscape.

Big DataTechnology Selectionsoftware engineering

0 likes · 11 min read

How to Choose the Right Tech Stack: Lessons from a Java Backend Veteran

DataFunSummit

Oct 16, 2021 · Databases

Practical Use Cases of Materialized Views and Indexes in Doris

This article shares practical experiences with Doris, covering materialized view concepts, typical use cases, index principles, performance optimizations, and real‑world scenarios such as order analysis, PV/UV aggregation, and detailed queries, while also providing operational tips and Q&A insights.

Big DataOLAPdoris

0 likes · 16 min read

Practical Use Cases of Materialized Views and Indexes in Doris

Selected Java Interview Questions

Oct 16, 2021 · Databases

Comparing MySQL and HBase: Architectural, Engine, and Use‑Case Differences

This article compares MySQL and HBase by examining their architectural designs, storage engines (B‑Tree vs LSM‑Tree), performance characteristics, ecosystem features such as TTL and multi‑versioning, and identifies scenarios where HBase is a suitable complement to MySQL for large‑scale data workloads.

B+TreeBig DataDatabase Architecture

0 likes · 8 min read

Comparing MySQL and HBase: Architectural, Engine, and Use‑Case Differences

JD Retail Technology

Oct 15, 2021 · Big Data

How JD’s Activity Cockpit Supercharges Mega‑Sale Performance with Optimize Table, BitMap, and Materialized Views

The article explains how JD’s Activity Cockpit tackles mega‑sale challenges by monitoring the consumer golden‑link, applying Optimize Table, BitMap, and materialized view techniques to reduce data volume, accelerate queries, and enable precise real‑time marketing for brands.

Big DataPerformance Optimizationbitmap indexing

0 likes · 6 min read

How JD’s Activity Cockpit Supercharges Mega‑Sale Performance with Optimize Table, BitMap, and Materialized Views

iQIYI Technical Product Team

Oct 15, 2021 · Industry Insights

How iQIYI Streamlined Event Tracking: A Deep Dive into Data Governance

This article details iQIYI's comprehensive data‑governance practice for event tracking, covering the definition of pingback, the need for governance, the governance framework, coordinate management, gray‑data handling, and the upgrade process that reduced tracking volume by 40% while cutting resource consumption in half.

AnalyticsBig DataData Governance

0 likes · 17 min read

How iQIYI Streamlined Event Tracking: A Deep Dive into Data Governance

21CTO

Oct 14, 2021 · Big Data

How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays

LinkedIn’s engineers detail how they repeatedly doubled their Hadoop cluster to over 11,000 nodes, tackled YARN scheduling delays caused by workload imbalances, and created the DynoYARN simulation tool to predict performance impacts of massive scaling.

Big DataDynoYARNHadoop

0 likes · 7 min read

How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays

IT Xianyu

Oct 14, 2021 · Databases

Comparing MySQL and HBase: Architecture, Engine, and Application Scenarios

This article compares MySQL and HBase by examining their architectural designs, storage engines, data access patterns, and ecosystem features, highlighting the strengths and trade‑offs of each system and outlining the scenarios where HBase is a suitable complement to MySQL.

B+TreeBig DataHBase

0 likes · 5 min read

Comparing MySQL and HBase: Architecture, Engine, and Application Scenarios

Ctrip Technology

Oct 14, 2021 · Big Data

Design and Implementation of a Real-Time Dynamic Tag Processing Platform for Trip.com International Business

The article describes the background, challenges, architecture, operator design, DAG processing, tag persistence, and business applications of a real-time dynamic tag processing platform (CDP) built to improve revenue growth and cost reduction for Trip.com's international operations.

Big DataDAGReal-time Streaming

0 likes · 16 min read

Design and Implementation of a Real-Time Dynamic Tag Processing Platform for Trip.com International Business

Alibaba Cloud Developer

Oct 13, 2021 · Big Data

Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing

This article examines the true meaning of consistency in stream computing, clarifies common misconceptions about exactly‑once semantics, formalizes consistency challenges, and reviews how major stream engines such as Google MillWheel, Apache Flink, Kafka Streams, and Spark Streaming implement end‑to‑end consistency.

Big DataExactly-Oncefault tolerance

0 likes · 29 min read

Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing

Java High-Performance Architecture

Oct 12, 2021 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

This article breaks down a typical big data platform architecture into its four layers—data collection, storage and analysis, sharing, and real‑time computation—detailing the essential tools such as Flume, HDFS, Hive, Spark, DataX, and task scheduling systems that enable scalable, low‑latency data processing and delivery.

Big DataData ArchitectureDataX

0 likes · 8 min read

Unpacking the Core Technologies Behind Modern Big Data Platforms

DataFunSummit

Oct 11, 2021 · Big Data

Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenario

This article explains how Tencent's financial big‑data platform uses Impala, detailing its overall architecture, concurrency mechanisms, cost‑based join optimization, storage layer design, and practical performance‑tuning experiences to achieve fast, interactive analytics.

Big DataImpalaOLAP

0 likes · 12 min read

Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenario

Architecture Digest

Oct 11, 2021 · Big Data

Core Technologies and Architecture of a Big Data Platform

This article explains the typical architecture of a big‑data platform, detailing its four core layers—data collection, storage & analysis, data sharing, and application—and describing the key technologies such as Flume, DataX, HDFS, Hive, Spark, Spark Streaming, and task scheduling components.

Big DataData ArchitectureDataX

0 likes · 8 min read

Core Technologies and Architecture of a Big Data Platform

Python Crawling & Data Mining

Oct 8, 2021 · Big Data

Why Feather Beats CSV for Large-Scale Data: Speed, Size, and Simplicity

This article explains the limitations of CSV for big datasets, introduces the Feather binary format, shows how to install and use it with Python and pandas, and compares its saving/loading speed and storage size against CSV, highlighting Feather's advantages for efficient data handling.

Big DataFeatherPerformance

0 likes · 7 min read

Why Feather Beats CSV for Large-Scale Data: Speed, Size, and Simplicity

DataFunTalk

Oct 7, 2021 · Big Data

Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios

This article introduces Impala's overall architecture, storage options, key features, concurrency mechanisms, CBO‑based join optimization techniques, storage‑layer principles and data‑filtering strategies, and summarizes practical performance‑tuning experiences from Tencent's financial big‑data platform.

Big DataCBOImpala

0 likes · 12 min read

Architect

Oct 6, 2021 · Big Data

Design and Implementation of a Real-time and Offline Integrated Query System

This article details the requirements, architecture, and implementation of a real-time and offline integrated query system, covering data ingestion via Debezium and Confluent Platform, storage in Kudu and HDFS, query engines Presto and Kylin, and strategies for data synchronization, partitioning, and scaling.

Big DataDebeziumKafka

0 likes · 19 min read

Design and Implementation of a Real-time and Offline Integrated Query System

DataFunTalk

Oct 6, 2021 · Big Data

Optimizing Flink Real‑Time Computing at Bilibili: Connector Stability, SQL, Runtime, and Future Outlook

This article details Bilibili's comprehensive optimization of Flink real‑time computing, covering connector stability improvements, SQL interval‑join enhancements, runtime state and checkpoint refinements, a diagnostic tool, and future directions for high‑throughput streaming workloads.

Big DataCheckpointFlink

0 likes · 18 min read

Optimizing Flink Real‑Time Computing at Bilibili: Connector Stability, SQL, Runtime, and Future Outlook

Architects' Tech Alliance

Oct 4, 2021 · Industry Insights

Key Technologies and Trends Powering Enterprise Digital Transformation

This article outlines the concept of enterprise digital transformation, detailing network evolution, platform‑centric infrastructure, business deconstruction, customer‑focused data value creation, and the importance of measurable value improvement as a core metric for successful digital change.

Artificial IntelligenceBig DataBlockchain

0 likes · 7 min read

Key Technologies and Trends Powering Enterprise Digital Transformation

DataFunTalk

Oct 2, 2021 · Artificial Intelligence

Baidu Data Federation Platform: Architecture, Applications, Federated Learning, and Explainability

This article presents an in‑depth overview of Baidu's Data Federation Platform, detailing its layered architecture, core technical capabilities, privacy‑preserving collaborative research on epidemic prediction and shared vehicle optimization, and explores federated learning types, PaddleFL implementations, and model explainability techniques.

Big DataFederated Learningexplainability

0 likes · 22 min read

Baidu Data Federation Platform: Architecture, Applications, Federated Learning, and Explainability

AntTech

Sep 28, 2021 · Databases

GeaGraph: Large-Scale Graph Computing System Wins World Internet Conference Award

The Ant Group and Tsinghua University’s jointly developed large‑scale graph computing system GeaGraph, recognized at the 2021 World Internet Conference, showcases world‑leading performance in trillion‑edge graph queries and exemplifies successful industry‑academia‑research collaboration for advanced database technology.

Big DataGeaGraphIndustry-Academia Collaboration

0 likes · 8 min read

GeaGraph: Large-Scale Graph Computing System Wins World Internet Conference Award

21CTO

Sep 27, 2021 · Big Data

Tech Highlights: China Crypto Ban, Huawei’s New Language, Kafka 3.0

A roundup of recent tech news covering China's crackdown on cryptocurrency, Huawei's upcoming programming language, the release of Apache Kafka 3.0, and other major developments in China's digital economy and industry leadership.

Apache KafkaBig DataDigital Economy

0 likes · 8 min read

Tech Highlights: China Crypto Ban, Huawei’s New Language, Kafka 3.0

Airbnb Technology Team

Sep 27, 2021 · Big Data

Midas Certification: Airbnb’s End-to-End Data Quality Framework

Airbnb’s Midas certification establishes a company‑wide, multi‑dimensional golden‑standard for data quality—covering accuracy, consistency, timeliness, cost, and completeness—by requiring collaborative design, automated health checks, and four review stages, ensuring certified data is reliable, well‑documented, and ready for reporting, experimentation, and machine‑learning.

AirbnbBig DataData Quality

0 likes · 12 min read

Midas Certification: Airbnb’s End-to-End Data Quality Framework

Java Interview Crash Guide

Sep 26, 2021 · Databases

MySQL vs HBase: Architectural, Engine, and Use‑Case Differences Explained

This article compares MySQL and HBase across architecture, engine design, indexing structures, and feature sets such as TTL and multi‑versioning, highlighting how MySQL excels in low‑latency online transactions while HBase offers distributed scalability and write‑optimized storage for big‑data scenarios.

B+TreeBig DataDatabase Architecture

0 likes · 9 min read

MySQL vs HBase: Architectural, Engine, and Use‑Case Differences Explained

Cloud Native Technology Community

Sep 26, 2021 · Big Data

Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests

Apache Kafka 3.0.0, released on September 21, 2021, introduces major changes such as deprecating Java 8 and Scala 2.12, adding Raft‑based metadata quorum, stronger producer delivery guarantees, removal of old message formats, numerous performance optimizations, extensive bug fixes, and a large set of new and updated JIRA issues across features, improvements, bugs, tasks, tests, and subtasks.

ApacheBig DataKafka3.0

0 likes · 37 min read

Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests

转转QA

Sep 26, 2021 · Big Data

A/B Testing Process Improvement and Validation Guide

This article outlines a comprehensive A/B testing workflow, covering historical issues, business test process improvements, detailed implementation steps, SQL validation scripts, data verification in analytics platforms, and practical notes to ensure accurate experiment data collection and analysis.

A/B testingBig Datadata validation

0 likes · 10 min read

A/B Testing Process Improvement and Validation Guide

Programmer DD

Sep 26, 2021 · Big Data

What’s New in Apache Kafka 3.0? Key Features and Improvements Explained

Apache Kafka 3.0.0 introduces a host of enhancements—including deprecated Java 8/Scala 2.12 support, Raft metadata snapshots, stronger producer guarantees, MirrorMaker 2 upgrades, and Kafka Streams improvements—while continuing to serve real‑time data pipelines and streaming applications.

Apache KafkaBig DataKafka 3.0

0 likes · 3 min read

What’s New in Apache Kafka 3.0? Key Features and Improvements Explained

StarRocks

Sep 24, 2021 · Big Data

How Didi Scaled Real‑Time Funnel Analysis with StarRocks: Architecture, Design, and Performance Tips

Didi's data architecture team migrated high‑volume, real‑time funnel analysis from ClickHouse to StarRocks, built a multi‑layer pipeline with Kafka, Flink/Spark, Hive, and materialized views, and achieved sub‑3‑second query times on billions of rows, while outlining future enhancements.

Big DataFunnel AnalysisSpark

0 likes · 14 min read

How Didi Scaled Real‑Time Funnel Analysis with StarRocks: Architecture, Design, and Performance Tips

DataFunTalk

Sep 23, 2021 · Databases

Practical Use Cases of Materialized Views and Indexes in Doris

This article shares practical experiences with Doris, covering materialized view concepts, typical use cases, advantages, creation syntax, prefix index principles, performance‑boosting scenarios such as order analysis, PV/UV counting, detail queries, and operational tips for high‑throughput and low‑latency workloads.

Big DataOLAPPerformance Optimization

0 likes · 18 min read

GrowingIO Tech Team

Sep 23, 2021 · Big Data

How to Build a Real‑Time Flink Metrics Dashboard with Prometheus & Grafana

This article explains how to monitor Flink jobs running on YARN by leveraging Flink metrics, configuring reporters, defining custom metrics, and visualizing the data in real time with Prometheus, Grafana, and Graphite‑exporter, complete with deployment diagrams and code examples.

Big DataFlinkGrafana

0 likes · 9 min read

How to Build a Real‑Time Flink Metrics Dashboard with Prometheus & Grafana

Java Architect Essentials

Sep 21, 2021 · Big Data

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

The interview with Kuaishou senior architect Zhao Jianbo details the three‑phase evolution of its trillion‑scale big data platform, covering foundational Hadoop services, real‑time and OLAP extensions, deep customizations, Spring Festival Gala challenges, scheduling innovations, Hadoop usage, and the relationship between big data and cloud architectures.

Big DataFlinkHadoop

0 likes · 19 min read

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

DataFunTalk

Sep 20, 2021 · Databases

Using GPLoad to Batch Load HDFS Data into Greenplum: Comparison with Hive and MPP Database Options

The article compares Hive and Greenplum as offline and MPP data‑warehouse solutions, reviews Hive query engine alternatives, and provides a detailed tutorial—including YAML configuration and a shell script—for using GPLoad to import HDFS data into Greenplum.

Big DataGPLoadGreenplum

0 likes · 8 min read

Using GPLoad to Batch Load HDFS Data into Greenplum: Comparison with Hive and MPP Database Options

Big Data Technology & Architecture

Sep 15, 2021 · Big Data

Linkis: Open‑Source Big Data Middleware Joins the Apache Incubator

Linkis, an open‑source computing middleware from WeBank, has entered the Apache Software Foundation Incubator, offering REST/WebSocket/JDBC interfaces to a wide range of engines such as Spark, Hive, Presto and Flink, and providing powerful governance, orchestration, and resource‑management capabilities for big‑data platforms.

Apache IncubatorBig DataData Platform

0 likes · 5 min read

Linkis: Open‑Source Big Data Middleware Joins the Apache Incubator

Alibaba Cloud Developer

Sep 15, 2021 · Big Data

How to Pick Real-Time Dimension & Result Tables for Cloud‑Native Big Data

This article examines the evolution of big‑data architectures toward cloud‑native, real‑time processing, and provides a detailed comparison of dimension‑table and result‑table options—including MySQL, Redis, and Alibaba Cloud Tablestore—along with their performance, cost, and scalability characteristics for Flink SQL workloads.

Big DataFlink SQLReal‑Time Computing

0 likes · 28 min read

How to Pick Real-Time Dimension & Result Tables for Cloud‑Native Big Data

IT Architects Alliance

Sep 12, 2021 · Industry Insights

Data Warehouse vs. Database: Core Differences and Building a Data Platform

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a data warehouse—including model selection, topic domain division, bus matrix, layered architecture, and data governance—then expands to the concept of a data middle platform and its distinction from data lakes and big‑data platforms.

Big DataData GovernanceData Platform

0 likes · 18 min read

Data Warehouse vs. Database: Core Differences and Building a Data Platform

Architects' Tech Alliance

Sep 11, 2021 · Big Data

Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a warehouse—including model selection, subject‑area definition, bus matrix, layering, and data quality—while also covering related concepts such as data middle platforms, data lakes, metadata, and modeling techniques.

Big DataData QualityETL

0 likes · 16 min read

Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices

DataFunTalk

Sep 11, 2021 · Cloud Computing

Industrial Data Cloud Migration: Architecture, Core Technologies, and Case Studies with Alibaba Cloud IoT

This article explains the background, challenges, overall architecture, core technology optimizations, edge‑computing integration, data modeling, serialization, and real‑world case studies of moving industrial IoT data to Alibaba Cloud, illustrating how cloud‑native solutions enable digital transformation in manufacturing.

Big DataCloud ComputingData Integration

0 likes · 16 min read

Industrial Data Cloud Migration: Architecture, Core Technologies, and Case Studies with Alibaba Cloud IoT

Tencent Tech

Sep 10, 2021 · Big Data

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

This article details how Sohu Changyou’s data team, together with Tencent Cloud engineers, planned and executed a seamless migration of over one petabyte of game data to Elastic MapReduce, Elasticsearch Service and Oceanus, achieving zero service impact and dramatically improving analytics performance.

Big DataEMRGame Analytics

0 likes · 9 min read

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

DataFunTalk

Sep 10, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing enhancements for coordinator high‑availability, and explaining a cross‑cluster scheduling strategy that leverages idle Presto resources to improve overall big‑data workload efficiency.

Big DataCross-Cluster SchedulingHA

0 likes · 16 min read

Ctrip Technology

Sep 9, 2021 · Big Data

Building Data Lineage at Ctrip: Architecture, Implementation, and Real‑World Applications

This article describes how Ctrip built a data lineage system for its big data platform, covering the concept of data lineage, collection methods, open‑source tools such as Apache Atlas and DataHub, the in‑house table‑level and field‑level solutions, implementation details for Hive, Spark and Presto, storage in JanusGraph, and practical applications in data governance, metadata management, scheduling and sensitivity labeling.

Big DataJanusGraphKafka

0 likes · 16 min read

Building Data Lineage at Ctrip: Architecture, Implementation, and Real‑World Applications

vivo Internet Technology

Sep 8, 2021 · Big Data

Overview of Vivo Marketing Automation Platform Architecture and Technical Design

The article outlines Vivo's marketing automation platform, explaining how it automates multi‑channel campaigns to solve timing, personalization, and ROI challenges, and describes its four business modules, layered system architecture—including gateway, service, compute, and storage components—and high‑availability features such as monitoring, smooth releases, rate limiting, and idempotent operations.

Big Data

0 likes · 14 min read

Overview of Vivo Marketing Automation Platform Architecture and Technical Design

Selected Java Interview Questions

Sep 7, 2021 · Big Data

Elasticsearch Basics: Core Concepts, Indexing, Write and Search Processes, Cluster Management and Performance Tips

This article provides a comprehensive overview of Elasticsearch, covering its fundamental architecture, key concepts such as indices, shards and replicas, the complete write and search workflows, consistency mechanisms, master node election, and practical performance‑tuning recommendations for large‑scale deployments.

Big DataCluster ManagementElasticsearch

0 likes · 15 min read

Elasticsearch Basics: Core Concepts, Indexing, Write and Search Processes, Cluster Management and Performance Tips

Volcano Engine Developer Services

Sep 6, 2021 · Databases

How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics

ByteDance’s ByteHouse, an enterprise‑grade ClickHouse, powers real‑time recommendation and ad‑delivery analytics at massive scale, detailing two case studies, technical selections, architectural designs, and performance optimizations such as asynchronous indexing, multi‑threaded Kafka consumption, and enhanced buffer engines to ensure data integrity.

Big DataByteHouseClickHouse

0 likes · 10 min read

How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics

Top Architect

Sep 6, 2021 · Big Data

Building a Real-Time Log Analysis Platform with ELK: Installation, Configuration, and Usage

This tutorial explains how to set up an open‑source ELK (Elasticsearch, Logstash, Kibana) stack for real‑time log collection, parsing, and visualization, covering component installation, Shipper/Indexer configuration, Grok pattern creation, Nginx integration, and background service management with Supervisor.

Big DataELKElasticsearch

0 likes · 19 min read

Building a Real-Time Log Analysis Platform with ELK: Installation, Configuration, and Usage

Laravel Tech Community

Sep 5, 2021 · Artificial Intelligence

Comprehensive Collection of Open Data Sources and Datasets for AI and Data Analysis

This article provides a curated list of publicly available data query websites, simple universal datasets, large-scale collections, and specialized datasets for machine learning, image classification, text classification, and recommendation systems, offering valuable resources for AI research and data-driven projects.

Artificial IntelligenceBig DataDatasets

0 likes · 7 min read

Comprehensive Collection of Open Data Sources and Datasets for AI and Data Analysis

IT Architects Alliance

Sep 5, 2021 · Big Data

Big Data Platform Architecture: Core Layers, Technologies, and Practices

This article outlines a typical big data platform architecture, detailing its core layers—data acquisition, storage and analysis, sharing, application, real‑time computation, and task scheduling—while introducing key technologies such as Flume, HDFS, Hive, Spark, DataX, and monitoring considerations.

Big DataData PlatformHadoop

0 likes · 9 min read

Big Data Platform Architecture: Core Layers, Technologies, and Practices

Architects Research Society

Sep 4, 2021 · Databases

Why Data Scientists Should Learn PostgreSQL

This article explains why mastering SQL and PostgreSQL is essential for data scientists, outlines the core skills of the role, describes PostgreSQL’s features, lists its advantages and drawbacks for data science, and suggests resources for getting started.

Big DataData ScienceHTAP

0 likes · 10 min read

Why Data Scientists Should Learn PostgreSQL

DataFunTalk

Sep 4, 2021 · Big Data

High‑Availability Practices of ClickHouse in JD.com: Architecture, Deployment, and Operations

The article details JD.com’s large‑scale OLAP strategy using ClickHouse as the primary engine and Doris as a secondary engine, covering application scenarios, component selection criteria, cluster deployment models, high‑availability architecture, fault‑handling procedures, performance tuning, and future cloud‑native plans.

Big DataClickHouseCluster Deployment

0 likes · 19 min read

High‑Availability Practices of ClickHouse in JD.com: Architecture, Deployment, and Operations

DataFunTalk

Sep 3, 2021 · Big Data

Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations

This article details ByteDance's implementation of an exabyte‑scale data lake using Apache Hudi, covering scenario requirements, engine selection, functional support, schema management, extensive performance tuning, and future directions, while also noting recruitment opportunities within the team.

Apache HudiBig DataByteDance

0 likes · 9 min read

Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations

DataFunTalk

Sep 1, 2021 · Big Data

Case Study: Migrating to DorisDB for High‑Performance Query Engine at Kuayue Group

The article details how Kuayue Group's big‑data center replaced Presto and ClickHouse with DorisDB, achieving sub‑second query latency, simplifying architecture, and enabling both online and real‑time OLAP analytics across millions of daily requests.

Big DataDorisDBOLAP

0 likes · 10 min read

Case Study: Migrating to DorisDB for High‑Performance Query Engine at Kuayue Group

ByteDance ADFE Team

Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataFlinkLindorm

0 likes · 24 min read

Evolution of the Big Data Technology Stack Over the Past Five Years

Baidu Geek Talk

Aug 30, 2021 · Artificial Intelligence

Baidu Credibility Certification Platform: Architecture, Core Capabilities, and Technical Design

Baidu Credibility Certification Platform is an AI‑powered verification service that offers unified authentication, qualification certification, workflow orchestration, and intelligent document validation for enterprises, institutions, and individuals, built on a mid‑platform architecture with shared components and future plans to expand content and service certification.

AIBaiduBig Data

0 likes · 15 min read

Baidu Credibility Certification Platform: Architecture, Core Capabilities, and Technical Design

Programmer DD

Aug 30, 2021 · Big Data

Why Is Kafka So Fast? Unveiling the Secrets Behind Its High Throughput

This article explains how Kafka achieves remarkable speed and massive throughput by using sequential disk I/O, OS page cache, zero‑copy transfers, partitioned log segments with indexes, batch processing, and efficient compression, making it a cornerstone of modern big‑data pipelines.

Big DataHigh ThroughputKafka

0 likes · 9 min read

Why Is Kafka So Fast? Unveiling the Secrets Behind Its High Throughput

DataFunTalk

Aug 28, 2021 · Big Data

Mid‑Year 2021 DSU Reading Selections – Technical Articles, Reflections, and Job Listings

The DSU mid‑year reading collection compiles high‑quality technical articles, reflective essays, and internal job referrals across data architecture, big‑data ecosystems, machine learning, data governance, and career development, providing a searchable resource for data professionals.

Big Datacareerdata engineering

0 likes · 7 min read

Mid‑Year 2021 DSU Reading Selections – Technical Articles, Reflections, and Job Listings

Python Programming Learning Circle

Aug 28, 2021 · Big Data

Accelerating Python Function Execution with Multiprocessing: Achieving a 30× Speedup on Large Datasets

This article explains how to use Python's multiprocessing module to parallelize a custom preprocessing function on a 537k‑record dataset, demonstrating roughly a thirty‑fold reduction in execution time compared with single‑process execution.

Big DataPerformancebenchmark

0 likes · 6 min read

Accelerating Python Function Execution with Multiprocessing: Achieving a 30× Speedup on Large Datasets

Tencent Cloud Developer

Aug 26, 2021 · Big Data

Recap of Shenzhen Elasticsearch Meetup – Community Growth, Compression Optimization, Real‑time Data Fusion, and Cluster Practices

The first Shenzhen Elasticsearch meetup on August 21, 2021, jointly hosted by the ES Chinese community and Tencent Cloud, gathered experts from Tencent, Tapdata, ByteDance and Vivo to showcase rapid community growth, compression‑encoding optimizations, real‑time ES‑MongoDB data fusion, custom kernel extensions, large‑scale cluster practices, and concluded with extensive Q&A and networking.

Big DataCluster ManagementElasticsearch

0 likes · 11 min read

Recap of Shenzhen Elasticsearch Meetup – Community Growth, Compression Optimization, Real‑time Data Fusion, and Cluster Practices

Qunar Tech Salon

Aug 26, 2021 · Big Data

Comprehensive Introduction to Apache Spark: History, Core Concepts, Architecture, and Performance Optimization

This article provides a thorough overview of Apache Spark, covering its origins, comparison with MapReduce, core concepts such as RDD, DAG, Jobs, Stages, and Tasks, the submission process, Web UI, and detailed performance tuning techniques including data skew mitigation.

Big DataData SkewMapReduce

0 likes · 15 min read

Comprehensive Introduction to Apache Spark: History, Core Concepts, Architecture, and Performance Optimization

Selected Java Interview Questions

Aug 25, 2021 · Databases

ClickHouse Overview: Architecture, MySQL Migration, Performance Testing, and Practical Tips

This article introduces ClickHouse, a high‑performance open‑source columnar database, explains its architecture versus row‑based systems, details migration from MySQL, showcases installation, performance benchmarks, data‑sync strategies, common pitfalls, and summarizes its benefits for large‑scale analytical workloads.

Big DataClickHouseColumnar Database

0 likes · 7 min read

ClickHouse Overview: Architecture, MySQL Migration, Performance Testing, and Practical Tips

DataFunSummit

Aug 22, 2021 · Big Data

Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans

This article details the historical development, architectural layers, ETL migration to Spark, data modeling standards, governance processes, resource optimization, security measures, and future roadmap of Meituan Waimai's offline data warehouse, illustrating how the team addressed scalability and efficiency challenges.

Big DataData GovernanceETL

0 likes · 21 min read

Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans

Baidu Intelligent Testing

Aug 19, 2021 · Big Data

Overview of Baidu's Wanxiang System for Large‑Scale Rich Media Processing

The article provides a comprehensive overview of Baidu's Wanxiang system, detailing how it tackles the challenges of massive image and video data processing, feature extraction, cross‑media indexing, and real‑time retrieval to support modern search engine products.

BaiduBig DataRich Media

0 likes · 13 min read

Overview of Baidu's Wanxiang System for Large‑Scale Rich Media Processing

Top Architect

Aug 18, 2021 · Big Data

Elasticsearch Indexing and Retrieval Optimization for Billion‑Scale Data

This article describes how a top architect optimized Elasticsearch for handling billions of records, covering Lucene fundamentals, index and shard design, DocValues, query performance tuning, bulk indexing strategies, hardware considerations, and testing methods to achieve sub‑second query responses across multi‑year data ranges.

Big DataElasticsearchIndex Optimization

0 likes · 12 min read

Elasticsearch Indexing and Retrieval Optimization for Billion‑Scale Data

Architects' Tech Alliance

Aug 17, 2021 · Cloud Computing

Integrated Vehicle‑Road Cloud Control System Architecture

The integrated vehicle‑road cloud control system is a next‑generation information‑physical architecture that unifies vehicles, roads, and cloud services through edge, regional, and central clouds, providing real‑time perception, decision‑making, and control to improve traffic safety, efficiency, and sustainability.

Big DataEdge ComputingSystem Architecture

0 likes · 10 min read

Integrated Vehicle‑Road Cloud Control System Architecture

dbaplus Community

Aug 17, 2021 · Big Data

How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics

This article examines JD's shift from a traditional Lambda‑based data warehouse to a Delta Lake‑powered real‑time data lake, detailing the challenges of legacy architectures, the evaluation of open‑source table formats, Delta Lake's core mechanisms, and the resulting simplified batch‑stream development workflow.

Batch-StreamBig DataData Lake

0 likes · 11 min read

How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics

DataFunTalk

Aug 14, 2021 · Databases

Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption

The article chronicles Lenovo Liancheng Zhida’s three‑stage evolution of OLAP engines—from early SQL Server scripts, through a Hadoop‑based Presto solution, to the adoption of DorisDB—detailing architecture, tool comparisons, implementation practices, and the performance and operational benefits achieved.

AnalyticsBig DataDorisDB

0 likes · 12 min read

Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption

IT Architects Alliance

Aug 14, 2021 · Big Data

An Introduction to Dimensional Modeling in Data Warehousing

This article provides a comprehensive overview of data warehouse concepts, compares classic warehouse models, explains dimensional modeling fundamentals such as fact and dimension tables, demonstrates a practical e‑commerce scenario with schema design and SQL query examples, and discusses real‑world trade‑offs.

Big DataETLStar Schema

0 likes · 9 min read

An Introduction to Dimensional Modeling in Data Warehousing

Volcano Engine Developer Services

Aug 11, 2021 · Big Data

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Volcengine’s Data Quality Platform bridges the gap between data validation and resource‑intensive computation in large‑scale environments, offering unified stream‑batch monitoring, data exploration, comparison, and alerting across Hive, ClickHouse, Kafka, and more, while addressing scalability, latency, and resource optimization challenges.

Big DataData QualityMonitoring

0 likes · 19 min read

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Baidu Intelligent Testing

Aug 10, 2021 · Backend Development

Evolution and Architecture of Baidu's Fengjing APM System

This article chronicles the four‑year evolution of Baidu's Fengjing performance‑monitoring platform, detailing its data collection, processing pipelines, successive architectural versions (1.0‑4.0), challenges such as probe intrusion and massive data volume, and the engineering solutions that enabled large‑scale, low‑cost, cloud‑native observability for thousands of Java services.

APMBig DataCloud Native

0 likes · 9 min read

Evolution and Architecture of Baidu's Fengjing APM System

21CTO

Aug 6, 2021 · Big Data

What the 2021 State of Data Science Reveals About Python, Automation, and Open Source

The 2021 State of Data Science report shows how COVID‑19 has impacted investment, highlights Python's dominance, examines automation's growing role, and reveals corporate attitudes toward open‑source contributions, offering data‑driven insights for professionals and educators alike.

Big DataData ScienceOpen-source

0 likes · 5 min read

What the 2021 State of Data Science Reveals About Python, Automation, and Open Source

DataFunTalk

Aug 5, 2021 · Big Data

Building a Unified High‑Performance OLAP Platform with DorisDB at Beike Real Estate

The article describes how Beike Real Estate consolidated multiple OLAP engines into a single DorisDB‑based platform, detailing the business challenges, DorisDB’s technical advantages, extensive performance and concurrency benchmarks, and the resulting improvements in stability, query speed, and operational simplicity across various business scenarios.

AnalyticsBig DataDorisDB

0 likes · 14 min read

Building a Unified High‑Performance OLAP Platform with DorisDB at Beike Real Estate

Baidu Intelligent Testing

Aug 5, 2021 · Operations

Baidu Search Stability Issue Analysis: Automated Fault Detection and Resolution Techniques

This article details Baidu Search's high‑availability engineering, describing eight major challenges in fault analysis and the corresponding innovations—index mirroring, streaming analysis, comprehensive label sets, feature engineering, query reconstruction, intelligent ranking, timeline analysis, and chaos engineering—that together enable near‑real‑time, 99% accurate detection and mitigation of search service failures.

Big DataReliabilityfault-analysis

0 likes · 13 min read

Baidu Search Stability Issue Analysis: Automated Fault Detection and Resolution Techniques

vivo Internet Technology

Aug 4, 2021 · Big Data

Applying ANTLR4 for Arithmetic Calculator and SQL Parsing over CSV Data

The article demonstrates how ANTLR4 can replace manual parsing by building a four‑operation calculator and a trimmed SQL parser for Presto, showing the workflow from grammar definition to generated lexer/parser and visitor code, then applying the SQL parser to query CSV data efficiently.

ANTLRBig DataCSV

0 likes · 20 min read

Applying ANTLR4 for Arithmetic Calculator and SQL Parsing over CSV Data

Alimama Tech

Aug 4, 2021 · Big Data

Fast Attribution Engine (FAE): High‑Performance Distributed Computing for User Behavior and Advertising Attribution

FAE, Alibaba’s high‑performance distributed MPP engine, stores billions of user‑behavior events in a time‑ordered AFile model and uses stateless masters, importers, mergers and workers with Redis and MySQL metadata to deliver sub‑second, 10‑100× faster ad‑attribution queries across ad‑hoc, offline and near‑real‑time scenarios such as frequency, path and funnel analysis.

Ad AttributionBig DataFAE

0 likes · 11 min read

Fast Attribution Engine (FAE): High‑Performance Distributed Computing for User Behavior and Advertising Attribution

The Dominant Programmer

Aug 4, 2021 · Big Data

Essential HDFS Shell Commands for Managing Hadoop Files

This guide explains how to use the HDFS shell (preferred via hdfs dfs) to list, copy, move, delete, and snapshot files in a Hadoop cluster, detailing command syntax, URI handling, generic options, and practical examples for each operation.

Big DataHDFSHadoop

0 likes · 9 min read

Essential HDFS Shell Commands for Managing Hadoop Files

ITFLY8 Architecture Home

Aug 4, 2021 · Big Data

How to Build a Scalable Event Analytics Platform with ClickHouse

This article explains the design of a high‑performance event analysis platform that ingests billions of daily logs, supports event, funnel, and retention queries, and leverages ClickHouse for storage, efficient writes, and fast analytical queries across massive datasets.

Big DataClickHouseEvent Analytics

0 likes · 12 min read

How to Build a Scalable Event Analytics Platform with ClickHouse

Volcano Engine Developer Services

Aug 3, 2021 · Big Data

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

This article, compiled from a Volcano Engine meetup, explains how ByteDance’s unified traffic platform designs, governs, and processes massive event‑tracking data in real time, covering embedding content solutions, link architecture, dynamic processing engines, and data‑governance practices that support trillions of daily events.

Big DataData GovernanceReal-time Processing

0 likes · 16 min read

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

Efficient Ops

Aug 2, 2021 · Operations

How Alibaba Scales Massive Big Data Engines with an SRE Framework

This article describes Alibaba’s comprehensive SRE system for managing ultra‑large‑scale big data engines, detailing stability metrics, resource cost management, and intelligent operation productization, and introduces speaker Fu Tianyuan, a senior operations expert leading the MaxCompute and DataWorks SRE team.

AlibabaBig DataCloud Computing

0 likes · 3 min read

How Alibaba Scales Massive Big Data Engines with an SRE Framework

The Dominant Programmer

Aug 2, 2021 · Big Data

How to Build a Beginner Hadoop Cluster on CentOS 7

This article introduces Apache Hadoop’s open‑source framework, explains its core components such as HDFS, MapReduce, ZooKeeper, HBase, Hive, Pig, Mahout, Sqoop, Flume, Chukwa, Oozie, Ambari and YARN, and outlines the steps to set up a beginner‑level Hadoop cluster on CentOS 7.

Big DataCentOS 7HBase

0 likes · 11 min read

How to Build a Beginner Hadoop Cluster on CentOS 7

Big Data Technology & Architecture

Aug 2, 2021 · Big Data

Comprehensive Big Data Interview Question Guide for Major Tech Companies

This article compiles extensive interview questions and topics covering Hadoop, Spark, Flink, Hive, Kafka, MySQL, Redis, Java fundamentals, and algorithms, organized by companies such as Xiaomi, ByteDance, Alibaba, Shopee, Tencent, Meituan, NetEase, and Baidu, to help candidates prepare effectively for big‑data engineering roles.

Big DataFlinkHadoop

0 likes · 22 min read

Comprehensive Big Data Interview Question Guide for Major Tech Companies

ByteDance SE Lab

Jul 30, 2021 · Operations

Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It

The article examines Salesforce’s five‑hour global outage caused by a shortcut DNS deployment and the subsequent recovery challenges, then explores a viral experiment where twenty smartphones generated artificial traffic congestion, illustrating how real‑time data feeds and operational safeguards can prevent large‑scale service disruptions.

Big DataCloud ComputingOperations

0 likes · 7 min read

Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It

JD Tech

Jul 30, 2021 · Databases

Practical Use of HBase in a Logistics HR Data Preprocessing Platform

This article details how the logistics HR data preprocessing platform processes around 20 million daily records by adopting HBase for high‑performance, scalable, column‑oriented storage, covering its architecture, read/write mechanisms, best practices, and performance considerations.

Big DataHBaseNoSQL

0 likes · 10 min read

Practical Use of HBase in a Logistics HR Data Preprocessing Platform

DataFunTalk

Jul 29, 2021 · Big Data

Real-Time Data Warehouse Construction at TAL Using DorisDB

This article details TAL's transition from offline to real-time data warehousing, describing business drivers, pain points, architectural evolution through Hive, Flink+Kudu, and DorisDB, and outlining the system design, data flow, scheduling, monitoring, and the resulting business and cost benefits.

AirflowBig DataDorisDB

0 likes · 14 min read

Real-Time Data Warehouse Construction at TAL Using DorisDB

Airbnb Technology Team

Jul 29, 2021 · Big Data

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

Airbnb’s 2019 Data Quality Improvement Plan reorganized its data‑engineering workforce, introduced a dedicated data‑engineer role, adopted a decentralized Minerva‑based architecture with Spark pipelines, instituted rigorous testing, governance, and certification processes, and established SLAs and monitoring to ensure timely, trustworthy, well‑documented data across the enterprise.

AirbnbBig DataData Architecture

0 likes · 13 min read

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

DataFunTalk

Jul 28, 2021 · Big Data

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

This article reviews the Pravega project and its Flink connector, covering Pravega's design for large‑scale streaming, the connector's evolution and exact‑once semantics, Flink 1.11 integration challenges, checkpoint mechanisms, and future plans such as schema‑registry and new Flink features.

Big DataCheckpointConnector

0 likes · 10 min read

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

DataFunTalk

Jul 27, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

This article describes how Shuhai Supply Chain upgraded its data warehouse from a complex, high‑cost 1.0 architecture to a streamlined, real‑time solution built around Apache Doris, detailing the motivations, design choices, zero‑code ingestion, metadata management, Flink connector, and the resulting performance gains.

Apache DorisBig DataFlink

0 likes · 13 min read

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

Big Data Technology Architecture

Jul 27, 2021 · Big Data

Key Components of the Big Data Ecosystem: Hadoop, Hive, HBase, Spark, Kafka, and Elasticsearch

This article introduces the most important and still mainstream components of the big data ecosystem—including Hadoop’s storage and compute framework, Hive data warehouse, HBase NoSQL database, Spark unified engine, Kafka messaging platform, and Elasticsearch search engine—explaining their core concepts, architectures, and typical use cases.

Big DataElasticsearchHBase

0 likes · 9 min read

Key Components of the Big Data Ecosystem: Hadoop, Hive, HBase, Spark, Kafka, and Elasticsearch

DataFunTalk

Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

AWSBig DataFlink

0 likes · 12 min read

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

DataFunTalk

Jul 25, 2021 · Databases

Practical Application of Apache Kudu at NetEase: Architecture, Use Cases, Challenges and Future Directions

This article explains Apache Kudu’s architecture, schema design, update mechanism, and how NetEase leverages it for real‑time data ingestion, dimension table joins, data‑warehouse ETL, and AB‑testing, while also discussing encountered issues and upcoming feature requests.

Apache KuduBig DataNetEase

0 likes · 11 min read

Practical Application of Apache Kudu at NetEase: Architecture, Use Cases, Challenges and Future Directions

Architecture Digest

Jul 25, 2021 · Big Data

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

The article details the background, architecture, core features, scheduling mechanisms, Lisp‑based query DSL, and Alluxio integration of Vipshop's self‑developed Hera data service, illustrating how it unifies multi‑engine data access, improves SLA, and accelerates large‑scale crowd computing tasks.

AlluxioBig DataData Service

0 likes · 21 min read

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

Architects Research Society

Jul 22, 2021 · Big Data

Enterprise Data Strategy: Aligning Tactics, Governance, and the Experience Economy

This article explores how a clear enterprise data strategy—distinguishing strategic goals from tactical steps, emphasizing clean and governed data, and integrating analytics with business missions—drives reliable outcomes and supports the experience economy through coordinated CXM platforms and data products.

AnalyticsBig DataData Governance

0 likes · 9 min read

Enterprise Data Strategy: Aligning Tactics, Governance, and the Experience Economy

dbaplus Community

Jul 21, 2021 · Big Data

Youzan’s Blueprint: Data Governance, Quality Scoring, and Cost Reduction for AI

At Youzan, data governance evolves from massive data assets to AI readiness through systematic data assetization, quantitative quality scoring, cost measurement, and targeted operational tactics, enabling precise quality monitoring, cost allocation, and continuous improvement that drive both data value and cost efficiency.

AI readinessBig DataCost Optimization

0 likes · 18 min read

Youzan’s Blueprint: Data Governance, Quality Scoring, and Cost Reduction for AI

Tencent Cloud Developer

Jul 21, 2021 · Big Data

Bloom Filter: Introduction, Theory, Construction, Query, and Applications

The article explains Bloom filters—a probabilistic, space‑efficient data structure using multiple hash functions on a bit array to answer set‑membership queries with controllable false‑positive rates, detailing their construction, query process, optimal parameters, and common uses such as URL deduplication, cache protection, and spam filtering.

Big DataCache Optimizationbloom-filter

0 likes · 8 min read

Bloom Filter: Introduction, Theory, Construction, Query, and Applications

IT Architects Alliance

Jul 20, 2021 · Big Data

Understanding Data Middle Platform: Layers, Architecture, and Implementation Methodology

The article explains the concept of a data middle platform, detailing its three-layer structure—data model, data service, and data development—illustrates how data modeling enables cross-domain integration, how services encapsulate data for flexible consumption, and how development tools support customized data applications, using a telecom operator example.

Big DataData ArchitectureData Platform

0 likes · 2 min read

Understanding Data Middle Platform: Layers, Architecture, and Implementation Methodology

Huawei Cloud Developer Alliance

Jul 20, 2021 · Backend Development

From Non‑Tech Student to Cloud MVP: Go, AI, and Startup Insights

In this interview, Huawei Cloud MVP Wang Ming shares how a non‑computer‑science background led him to a successful IT career, discusses the advantages of interdisciplinary skills, offers entrepreneurship advice, predicts future tech trends, and explains the key concepts of his popular Go concurrency book.

Artificial IntelligenceBig DataEntrepreneurship

0 likes · 7 min read

From Non‑Tech Student to Cloud MVP: Go, AI, and Startup Insights

Xianyu Technology

Jul 20, 2021 · Big Data

Design and Implementation of a Content Flow Control System for Xianyu Community

The Xianyu “Play” tab flow‑control system combines task‑specific and rule‑based strategies with a dynamic strategy‑, control‑, and distribution‑chain architecture that integrates real‑time data processing into the recommendation engine, delivering guaranteed exposure, boosting daily posts by 14.4 % and paving the way for multi‑objective, zero‑code control.

Big DataFlow ControlReal-time Streaming

0 likes · 6 min read

Design and Implementation of a Content Flow Control System for Xianyu Community

21CTO

Jul 18, 2021 · Databases

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

This article examines common causes of slow MySQL queries, explains index mechanics and failures, then compares ElasticSearch’s fast tokenized search and HBase’s column‑oriented storage, offering practical guidance on when and how to use each technology.

Big DataDatabase PerformanceHBase

0 likes · 21 min read

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

Open Source Linux

Jul 17, 2021 · Big Data

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

This article provides a clear, visual guide to Kafka’s core concepts—including producers, consumers, topics, partitions, consumer groups, message ordering, and the underlying ZooKeeper‑managed cluster architecture—helping readers grasp how Kafka enables reliable, scalable stream processing.

Big DataConsumersPartitions

0 likes · 6 min read

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

Architects' Tech Alliance

Jul 15, 2021 · Cloud Computing

Edge Computing: Challenges, Research Focus, and Related Paradigms

The article explains edge computing as a decentralized computing model that addresses high‑reliability, low‑latency demands, data‑center energy consumption, big‑data processing pressure, low resource utilization, intelligent front‑ends, and security‑privacy concerns, and it outlines key research areas and related paradigms such as fog, mobile edge, sea, and intelligent edge computing.

Big DataEdge ComputingFog Computing

0 likes · 8 min read

Edge Computing: Challenges, Research Focus, and Related Paradigms

Xianyu Technology

Jul 13, 2021 · Big Data

Design and Implementation of Xianyu Real-Time Data Warehouse

To meet Xianyu’s billion‑event‑per‑day real‑time analysis needs, the team built a petabyte‑scale warehouse using Hologres for storage and Alibaba‑enhanced Flink (Blink) for streaming, organized into ODS, DWD, DWS, ADS and DIM layers, enabling minute‑level aggregations, rapid anomaly detection, and instant product‑team insights.

Big DataHologresblink

0 likes · 12 min read

Design and Implementation of Xianyu Real-Time Data Warehouse