Tagged articles
237 articles
Page 3 of 3
Amap Tech
Amap Tech
Apr 10, 2020 · Backend Development

Platformization of POI Deep Information Integration at Amap: Design and Implementation

Amap transformed its fragmented POI deep‑information pipelines into a unified platform that automates data acquisition, parsing, dimension alignment, specification mapping, and lifecycle management across billions of records, enabling product managers to integrate, debug, and scale diverse content‑provider feeds with real‑time, end‑to‑end control.

BackendBig DataConversion Engine
0 likes · 13 min read
Platformization of POI Deep Information Integration at Amap: Design and Implementation
DataFunTalk
DataFunTalk
Apr 2, 2020 · Artificial Intelligence

Building and Applying an Industry Knowledge Graph: Lessons from Beike Real Estate

The article explains how Beike Real Estate constructs an industry knowledge graph by integrating internal and external data, outlines the technical framework and data processing steps, and demonstrates its AI-driven applications such as intelligent Q&A, recommendation, and decision support for the real‑estate market.

AI applicationsData IntegrationKnowledge Graph
0 likes · 8 min read
Building and Applying an Industry Knowledge Graph: Lessons from Beike Real Estate
21CTO
21CTO
Feb 19, 2020 · Big Data

Building an Open-Source Big Data Analytics Stack: Challenges & Benefits

The article explains why modern companies rely on data‑driven decisions, outlines the two main challenges of tracking data and connecting it to BI, describes the three‑step analytics stack (integration, warehouse, analysis), and highlights the cost, flexibility, and security advantages of open‑source tools.

Big DataData AnalyticsData Integration
0 likes · 5 min read
Building an Open-Source Big Data Analytics Stack: Challenges & Benefits
Java High-Performance Architecture
Java High-Performance Architecture
Jan 7, 2020 · Backend Development

How to Build a Scalable Reporting Service in a Microservice Architecture

To generate a user‑enriched order report in a microservice system, the article compares four approaches—direct DB access, REST data aggregation, batch pulling, and an event‑driven model—highlighting their trade‑offs in coupling, performance, scalability, and resilience, and recommends the event‑push solution.

Data IntegrationEvent-drivenKafka
0 likes · 5 min read
How to Build a Scalable Reporting Service in a Microservice Architecture
HomeTech
HomeTech
Dec 12, 2019 · Big Data

Architecture and Design of the Home Data Integration Governance Platform

The article describes the background, architecture, and design principles of a unified big‑data scheduling and data‑exchange platform, detailing its data ingestion “direct‑train”, centralized scheduling engine, and DataX‑based data‑exchange components along with monitoring, alerting, and security features.

Big DataData IntegrationDataX
0 likes · 7 min read
Architecture and Design of the Home Data Integration Governance Platform
Architects Research Society
Architects Research Society
Oct 23, 2019 · Big Data

Talend Performance Tuning Strategy: Identifying and Eliminating Bottlenecks

This article presents a structured, repeatable approach for Talend data‑integration jobs that guides readers through pinpointing performance bottlenecks, testing individual pipeline stages, and applying targeted optimizations to sources, targets, and transformations to achieve higher throughput and more reliable ETL processes.

Bottleneck AnalysisData IntegrationETL
0 likes · 9 min read
Talend Performance Tuning Strategy: Identifying and Eliminating Bottlenecks
YooTech Youzu Tech Team
YooTech Youzu Tech Team
Oct 16, 2019 · Product Management

How I Built an Automated Financial Reporting System for Global Game Platforms

This article details the end‑to‑end design and implementation of a custom tool—named “Crystal Palace”—that automates financial reporting across App Store, Google Play, Facebook and Amazon, turning a tedious manual reconciliation process into a scalable, data‑driven solution for game publishers.

AutomationData Integrationfinancial reporting
0 likes · 6 min read
How I Built an Automated Financial Reporting System for Global Game Platforms
Snowball Engineer Team
Snowball Engineer Team
Sep 24, 2019 · Big Data

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

The article introduces Snowball's AIBO data middle platform, detailing its storage‑compute separation architecture, core capabilities such as data integration, catalog, tagging, analysis tools, micro‑service data APIs, and outlines future enhancements for security, lineage, and continuous business‑driven iteration.

Big DataData CatalogData Integration
0 likes · 12 min read
Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine
0 likes · 12 min read
QuickSQL: 360’s Unified Multi-Source Query Engine Explained
Beike Product & Technology
Beike Product & Technology
Aug 29, 2019 · Big Data

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

This article introduces TiSpark—an extension of Spark that tightly integrates with TiDB/TiKV to enable high‑performance, scalable data synchronization and OLAP queries, details its architecture, key configuration, performance advantages over Spark SQL and Sqoop, and outlines its role in the Databus data‑integration platform.

Big DataData IntegrationPerformance Optimization
0 likes · 10 min read
TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project
360 Tech Engineering
360 Tech Engineering
Aug 27, 2019 · Databases

Quicksql: A Unified Cross‑Data‑Source SQL Query Engine

Quicksql is an open‑source, cross‑data‑source SQL engine built on Apache Calcite that provides a unified, safe, and fast SQL interface, enabling users to query heterogeneous storage systems such as Hive, MySQL, Elasticsearch, and Druid through command‑line, API, or JDBC connections.

Apache CalciteData IntegrationSQL
0 likes · 6 min read
Quicksql: A Unified Cross‑Data‑Source SQL Query Engine
Architects' Tech Alliance
Architects' Tech Alliance
Aug 5, 2019 · Industry Insights

Why Customer Data Platforms Are Redefining Modern Marketing

The article examines how fragmented SaaS marketing stacks limit real‑time data use, explains the evolution from early CRM to marketing automation, highlights the shortcomings of MQL models, and shows how Customer Data Platforms (CDPs) restore data continuity, scalability, and campaign effectiveness.

Data IntegrationDigital MarketingMarketing Automation
0 likes · 9 min read
Why Customer Data Platforms Are Redefining Modern Marketing
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 2, 2019 · Big Data

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

This article explains how Apache Pulsar and Apache Flink can be combined to provide a unified, scalable, and fault‑tolerant data processing platform, covering Pulsar's architecture, its differences from other messaging systems, various integration patterns, and concrete code examples for stream and batch workloads.

Apache FlinkApache PulsarBig Data
0 likes · 13 min read
Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing
Architecture Digest
Architecture Digest
May 13, 2019 · Artificial Intelligence

Enterprise Knowledge Graphs: Development Trends, Use Cases, Database Selection, and Implementation Practices

This article outlines the evolution of knowledge graphs, describes typical enterprise application scenarios, compares graph database options such as Neo4j, Cayley and Dgraph, and presents a six‑step methodology for building, storing, and applying knowledge graphs in large‑scale business environments.

Data IntegrationEnterprise AIKnowledge Graph
0 likes · 13 min read
Enterprise Knowledge Graphs: Development Trends, Use Cases, Database Selection, and Implementation Practices
37 Interactive Technology Team
37 Interactive Technology Team
Mar 28, 2019 · Big Data

Approaches to Building a Basic Data Platform

To handle terabytes of daily data and diverse business needs, the company built a three‑layer basic data platform—collection/computation/storage, unified data management, and API‑driven services—augmented by a standardized collection system, a robust Domino scheduler, and a self‑service analysis tool, aiming to evolve into a full data‑middle‑office for end‑to‑end intelligence.

Data ArchitectureData IntegrationScheduling
0 likes · 8 min read
Approaches to Building a Basic Data Platform
Beike Product & Technology
Beike Product & Technology
Feb 21, 2019 · Big Data

DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem

The article presents an in‑depth overview of the DATABUS data integration platform, detailing its background, current challenges, core capabilities such as data syncing, metadata automation, real‑time subscriptions, and its reliance on TiDB, TiSpark, Hudi, and related big‑data technologies to enable near‑real‑time data warehousing.

Big DataData IntegrationHive
0 likes · 13 min read
DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem
360 Tech Engineering
360 Tech Engineering
Dec 28, 2018 · Databases

Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine

Quicksql is an open‑source unified SQL query engine that simplifies and secures cross‑data‑source queries by providing a consistent ANSI‑based language, automatic engine selection, and support for mixed queries across Hive, MySQL, Elasticsearch, and other platforms, reducing learning and integration costs.

Data IntegrationSQL EngineUnified query
0 likes · 6 min read
Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine
Efficient Ops
Efficient Ops
Dec 24, 2018 · Operations

How Baidu’s Noah Platform Unifies Ops Data with Pull, Push, and Lazy ETL

This article explains how Baidu Cloud's Noah intelligent operations product builds a unified operations knowledge base by categorizing metadata, status, and event data and applying three ETL approaches—Pull, Push, and Lazy—to handle offline, near‑line, and real‑time data integration.

Data IntegrationETLKnowledge Base
0 likes · 8 min read
How Baidu’s Noah Platform Unifies Ops Data with Pull, Push, and Lazy ETL
Youzan Coder
Youzan Coder
Aug 31, 2018 · Big Data

Evolution of Youzan Search Platform Architecture: From 1.0 to 4.0

The Youzan Search Platform evolved from a simple Elasticsearch cluster in 2015 to a modular, message‑driven architecture with proxy validation, caching, and management tools, and now plans a cloud‑native, Kubernetes‑based 4.0 version that automates data sync, isolates workloads, and scales elastically to support billions of records.

Data IntegrationElasticsearchProxy
0 likes · 14 min read
Evolution of Youzan Search Platform Architecture: From 1.0 to 4.0
Zhongtong Tech
Zhongtong Tech
Aug 31, 2018 · Databases

How Aries Uses MySQL GTID Binlog to Power Real‑Time Data Sync at Scale

Aries, an internally built MySQL incremental log distribution platform, leverages GTID‑based binlog dumping to achieve stable, consistent, and real‑time data synchronization across heterogeneous systems, supporting use cases such as Elasticsearch sync, cache updates, archiving, and live statistics.

Data IntegrationGTIDdatabase
0 likes · 7 min read
How Aries Uses MySQL GTID Binlog to Power Real‑Time Data Sync at Scale
dbaplus Community
dbaplus Community
Aug 8, 2018 · Big Data

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

Big Data ArchitectureData IntegrationKafka
0 likes · 24 min read
How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns
58 Tech
58 Tech
Jun 27, 2018 · Big Data

Overview of the 58 User Profile System Architecture and Data Processing

The article describes the design, data integration, ID mapping, tag generation, and application scenarios of the 58 user profiling platform, which aggregates billions of user IDs across multiple business lines to provide online and offline persona data for personalization, analytics, and AI modeling.

Big DataData ArchitectureData Integration
0 likes · 12 min read
Overview of the 58 User Profile System Architecture and Data Processing
ITPUB
ITPUB
Nov 23, 2017 · Big Data

7 Typical Big Data Projects Every Hadoop Engineer Should Know

The article outlines seven common big‑data initiatives—data integration, specialized analytics, Hadoop‑as‑a‑service, stream processing, complex event handling, ETL pipelines, and SAS replacement—explaining their goals, typical technologies such as HDFS, Hive, Spark, Storm, Kafka, and practical considerations for enterprises adopting Hadoop ecosystems.

Data IntegrationHadoopproject types
0 likes · 8 min read
7 Typical Big Data Projects Every Hadoop Engineer Should Know
Efficient Ops
Efficient Ops
Sep 25, 2017 · Operations

How Qunar Scaled Application Ops Automation from Hundreds to Tens of Thousands of Servers

This article details Qunar's journey of automating application operations, covering the evolution of their host‑management system, unified monitoring/alert platform, and data‑interchange mechanisms that enabled the company to grow from a few hundred to over ten thousand servers with a stable six‑person ops team.

Data IntegrationOperations AutomationQunar
0 likes · 25 min read
How Qunar Scaled Application Ops Automation from Hundreds to Tens of Thousands of Servers
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Jul 28, 2017 · Big Data

How Transwarp Transporter Enables Near‑Real‑Time ETL in Big Data Pipelines

The article introduces Transwarp Transporter, a near‑real‑time ETL tool for TDH 5.x, explains its architecture, visual dashboard, drag‑and‑drop data‑flow design, debugging features, parameter management, and highlights how it empowers business users to achieve fast, reliable data migration in big‑data environments.

Data IntegrationETLTranswarp
0 likes · 7 min read
How Transwarp Transporter Enables Near‑Real‑Time ETL in Big Data Pipelines
Architecture Digest
Architecture Digest
Jul 22, 2017 · Big Data

Popular Big Data Tools and Their Descriptions

This article provides an extensive overview of more than ninety open‑source and commercial big‑data tools—including ETL platforms, resource managers, storage systems, messaging queues, processing engines, and visualization libraries—detailing their core functions, typical use cases, and notable adopters.

AnalyticsBig DataData Integration
0 likes · 26 min read
Popular Big Data Tools and Their Descriptions
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 7, 2017 · Big Data

Unified Data Platforms: How UMENG+ Redefines Big Data Strategy

The article explores the evolution of big‑data applications in China, from Oracle’s trend report and the concept of "omni‑domain data" to UMENG+’s technical architecture, unified tech stack, AI integration, and future directions for delivering real customer value.

Big DataData AnalyticsData Integration
0 likes · 12 min read
Unified Data Platforms: How UMENG+ Redefines Big Data Strategy

The Growing Role of Apache Kafka in Modern Big Data Architectures

The article explains how Apache Kafka has become a pivotal, high‑scalable publish‑subscribe system in the big‑data ecosystem, addressing the limitations of traditional databases, enabling real‑time data integration across specialized distributed systems, and shaping future data‑governance practices.

Apache KafkaData IntegrationStreaming
0 likes · 7 min read
The Growing Role of Apache Kafka in Modern Big Data Architectures
Qunar Tech Salon
Qunar Tech Salon
Jul 8, 2015 · Big Data

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

This article explains how logs—simple, append‑only, time‑ordered records—serve as the core abstraction behind databases, distributed systems, data integration pipelines, and modern stream‑processing platforms such as Kafka and Hadoop, illustrating their design, scalability, and practical challenges.

Big DataData IntegrationDistributed Systems
0 likes · 45 min read
Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing
Architect
Architect
Jul 6, 2015 · Big Data

Understanding Logs: The Core of Distributed Systems and Data Integration

This article explains how logs—simple, append‑only, time‑ordered records—serve as the fundamental abstraction behind databases, distributed systems, data integration pipelines, and stream‑processing platforms like Kafka and Hadoop, illustrating their role in ordering, replication, scalability, and real‑time analytics.

Data IntegrationDistributed SystemsHadoop
0 likes · 48 min read
Understanding Logs: The Core of Distributed Systems and Data Integration