Tagged articles

Data Integration

243 articles · Page 3 of 3

Aug 20, 2020 · Big Data

Differences Between Talend and Pentaho ETL Tools

The article explains the fundamentals of ETL, compares Talend and Pentaho in terms of openness, connectivity, support, performance, GUI usability, deployment flexibility, and cost, and concludes with guidance on choosing the appropriate tool based on specific business and technical requirements.

ComparisonData IntegrationETL

0 likes · 7 min read

Differences Between Talend and Pentaho ETL Tools

Qunar Tech Salon

Jun 3, 2020 · Fundamentals

Optimizing International Hotel Data Aggregation Algorithms at Qunar

The article outlines Qunar’s challenges in aggregating international hotel data, analyzes issues such as localized address formats and limited text similarity parsing, and presents a pattern‑matching and weighted scoring approach that improves aggregation accuracy across multiple countries.

Data IntegrationPattern Matchingalgorithm optimization

0 likes · 7 min read

Optimizing International Hotel Data Aggregation Algorithms at Qunar

Meituan Technology Team

May 28, 2020 · Big Data

Design and Implementation of Meituan Delivery A/B Testing Platform and Evaluation System

The article details Meituan Delivery’s A/B testing platform and evaluation system, explaining its closed‑loop design, multi‑strategy traffic allocation with AA grouping, comprehensive metric hierarchy, statistical rigor, data integration, and implementation architecture, and outlines future tools for traffic‑volume recommendation.

A/B testingData IntegrationMetrics

0 likes · 20 min read

Design and Implementation of Meituan Delivery A/B Testing Platform and Evaluation System

Big Data Technology & Architecture

May 20, 2020 · Big Data

Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection

This article presents a comprehensive technical overview of the Real-time Data Platform (RTDP), detailing its overall architecture, component selection—including DBus, Kafka, Wormhole, Moonbox, and Davinci—design philosophies, functional features, and various deployment patterns such as synchronous, stream-processing, rotation, and intelligent modes.

Data GovernanceData Integration

0 likes · 26 min read

Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection

Big Data Technology & Architecture

Apr 25, 2020 · Big Data

Integrating SparkSQL with Hive: Configuration, MetaStore Setup, and Example Scala Code

This article explains the differences between Spark on Hive and Hive on Spark, then provides step‑by‑step instructions for configuring Hive MetaStore, setting up SparkSQL to use Hive, and demonstrates a complete Scala program that creates a Hive table, loads data, and queries it.

Big DataData IntegrationHive

0 likes · 7 min read

Integrating SparkSQL with Hive: Configuration, MetaStore Setup, and Example Scala Code

Suning Technology

Apr 23, 2020 · Big Data

How Data Fusion Drives Retail Revival: Lessons from Suning’s Digital Transformation

The article examines China's push to develop data factor markets and showcases how Suning’s integration of online and offline data, big‑data analytics, and AI is revitalizing traditional retail, illustrating the broader impact of digital transformation on the post‑pandemic economy.

AIBig DataData Integration

0 likes · 8 min read

How Data Fusion Drives Retail Revival: Lessons from Suning’s Digital Transformation

Amap Tech

Apr 10, 2020 · Backend Development

Platformization of POI Deep Information Integration at Amap: Design and Implementation

Amap transformed its fragmented POI deep‑information pipelines into a unified platform that automates data acquisition, parsing, dimension alignment, specification mapping, and lifecycle management across billions of records, enabling product managers to integrate, debug, and scale diverse content‑provider feeds with real‑time, end‑to‑end control.

Big DataConversion EngineData Integration

0 likes · 13 min read

Platformization of POI Deep Information Integration at Amap: Design and Implementation

Python Programming Learning Circle

Apr 3, 2020 · Databases

Using Python and pyodbc to Operate MS SQL Server: A Step‑by‑Step Guide

This article demonstrates how to build a reusable Python class with pyodbc to connect to Microsoft SQL Server, import CSV files, create tables, and perform common operations such as push, union, and drop, providing complete code examples and explanations for each step.

Data IntegrationSQL Serverdatabase

0 likes · 9 min read

Using Python and pyodbc to Operate MS SQL Server: A Step‑by‑Step Guide

DataFunTalk

Apr 2, 2020 · Artificial Intelligence

Building and Applying an Industry Knowledge Graph: Lessons from Beike Real Estate

The article explains how Beike Real Estate constructs an industry knowledge graph by integrating internal and external data, outlines the technical framework and data processing steps, and demonstrates its AI-driven applications such as intelligent Q&A, recommendation, and decision support for the real‑estate market.

AI ApplicationsData IntegrationKnowledge Graph

0 likes · 8 min read

Building and Applying an Industry Knowledge Graph: Lessons from Beike Real Estate

21CTO

Feb 19, 2020 · Big Data

Building an Open-Source Big Data Analytics Stack: Challenges & Benefits

The article explains why modern companies rely on data‑driven decisions, outlines the two main challenges of tracking data and connecting it to BI, describes the three‑step analytics stack (integration, warehouse, analysis), and highlights the cost, flexibility, and security advantages of open‑source tools.

Big DataData IntegrationData Warehouse

0 likes · 5 min read

Building an Open-Source Big Data Analytics Stack: Challenges & Benefits

Big Data Technology & Architecture

Jan 20, 2020 · Big Data

Understanding Data Middle Platform: Architecture, Components, and Operational Practices

The article explains the concept, architecture, and key components of a data middle platform—including data aggregation, development, asset management, service systems, and operational and security mechanisms—while also promoting related books and a giveaway.

Big DataData ArchitectureData Governance

0 likes · 7 min read

Understanding Data Middle Platform: Architecture, Components, and Operational Practices

Java High-Performance Architecture

Jan 7, 2020 · Backend Development

How to Build a Scalable Reporting Service in a Microservice Architecture

To generate a user‑enriched order report in a microservice system, the article compares four approaches—direct DB access, REST data aggregation, batch pulling, and an event‑driven model—highlighting their trade‑offs in coupling, performance, scalability, and resilience, and recommends the event‑push solution.

Data IntegrationMicroservicesReporting

0 likes · 5 min read

How to Build a Scalable Reporting Service in a Microservice Architecture

HomeTech

Dec 12, 2019 · Big Data

Architecture and Design of the Home Data Integration Governance Platform

The article describes the background, architecture, and design principles of a unified big‑data scheduling and data‑exchange platform, detailing its data ingestion “direct‑train”, centralized scheduling engine, and DataX‑based data‑exchange components along with monitoring, alerting, and security features.

Big DataData IntegrationDataX

0 likes · 7 min read

Architecture and Design of the Home Data Integration Governance Platform

Beike Product & Technology

Oct 30, 2019 · Operations

Beeswax Finds Home Middle Platform Development and Business Process Optimization in Real Estate

Beeswax Finds Home developed a middle platform to integrate data and business processes in the real estate industry, enhancing efficiency and enabling new business models through shared capabilities and standardized solutions.

Business Process OptimizationData Integrationmiddle platform

0 likes · 6 min read

Beeswax Finds Home Middle Platform Development and Business Process Optimization in Real Estate

Architects Research Society

Oct 23, 2019 · Big Data

Talend Performance Tuning Strategy: Identifying and Eliminating Bottlenecks

This article presents a structured, repeatable approach for Talend data‑integration jobs that guides readers through pinpointing performance bottlenecks, testing individual pipeline stages, and applying targeted optimizations to sources, targets, and transformations to achieve higher throughput and more reliable ETL processes.

Bottleneck AnalysisData IntegrationETL

0 likes · 9 min read

Talend Performance Tuning Strategy: Identifying and Eliminating Bottlenecks

YooTech Youzu Tech Team

Oct 16, 2019 · Product Management

How I Built an Automated Financial Reporting System for Global Game Platforms

This article details the end‑to‑end design and implementation of a custom tool—named “Crystal Palace”—that automates financial reporting across App Store, Google Play, Facebook and Amazon, turning a tedious manual reconciliation process into a scalable, data‑driven solution for game publishers.

AutomationData Integrationfinancial reporting

0 likes · 6 min read

How I Built an Automated Financial Reporting System for Global Game Platforms

Big Data Technology & Architecture

Oct 13, 2019 · Big Data

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

This guide introduces the canal‑kafka integration package, outlines its constraints, and provides a step‑by‑step tutorial with Maven dependencies and Java source code for a SimpleCanalClient, a Kafka producer, and a server class, enabling a functional demo of Canal to Kafka data streaming.

Big DataCanalData Integration

0 likes · 8 min read

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

Snowball Engineer Team

Sep 24, 2019 · Big Data

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

The article introduces Snowball's AIBO data middle platform, detailing its storage‑compute separation architecture, core capabilities such as data integration, catalog, tagging, analysis tools, micro‑service data APIs, and outlines future enhancements for security, lineage, and continuous business‑driven iteration.

Big DataData CatalogData Integration

0 likes · 12 min read

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

360 Zhihui Cloud Developer

Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationFederated Query

0 likes · 12 min read

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

Beike Product & Technology

Aug 29, 2019 · Big Data

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

This article introduces TiSpark—an extension of Spark that tightly integrates with TiDB/TiKV to enable high‑performance, scalable data synchronization and OLAP queries, details its architecture, key configuration, performance advantages over Spark SQL and Sqoop, and outlines its role in the Databus data‑integration platform.

Big DataData IntegrationPerformance Optimization

0 likes · 10 min read

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

360 Tech Engineering

Aug 27, 2019 · Databases

Quicksql: A Unified Cross‑Data‑Source SQL Query Engine

Quicksql is an open‑source, cross‑data‑source SQL engine built on Apache Calcite that provides a unified, safe, and fast SQL interface, enabling users to query heterogeneous storage systems such as Hive, MySQL, Elasticsearch, and Druid through command‑line, API, or JDBC connections.

Apache CalciteData IntegrationSQL

0 likes · 6 min read

Quicksql: A Unified Cross‑Data‑Source SQL Query Engine

360 Tech Engineering

Aug 27, 2019 · Big Data

Design and Practice of 360’s Multi‑Data‑Source Unified SQL Query Engine

The article presents 360’s challenges with heterogeneous, high‑volume data sources, explains the design of a unified federated SQL engine called QuickSQL that leverages Apache Calcite, Spark, Flink and other back‑ends, and evaluates its performance and future development directions.

Apache CalciteData IntegrationFederated Query

0 likes · 10 min read

Design and Practice of 360’s Multi‑Data‑Source Unified SQL Query Engine

Architects' Tech Alliance

Aug 5, 2019 · Industry Insights

Why Customer Data Platforms Are Redefining Modern Marketing

The article examines how fragmented SaaS marketing stacks limit real‑time data use, explains the evolution from early CRM to marketing automation, highlights the shortcomings of MQL models, and shows how Customer Data Platforms (CDPs) restore data continuity, scalability, and campaign effectiveness.

Data IntegrationDigital MarketingIndustry insight

0 likes · 9 min read

Why Customer Data Platforms Are Redefining Modern Marketing

Big Data Technology & Architecture

Jul 2, 2019 · Big Data

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

This article explains how Apache Pulsar and Apache Flink can be combined to provide a unified, scalable, and fault‑tolerant data processing platform, covering Pulsar's architecture, its differences from other messaging systems, various integration patterns, and concrete code examples for stream and batch workloads.

Apache FlinkApache PulsarBig Data

0 likes · 13 min read

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

Programmer DD

Jun 15, 2019 · Big Data

How to Sync MySQL Data to Elasticsearch with Logstash: Step‑by‑Step Guide

This guide walks you through installing JDK, Logstash, Ruby, and required plugins, configuring Logstash to pull data from a MySQL table, and sending it to Elasticsearch, including code snippets, configuration files, and troubleshooting tips for a smooth data synchronization.

Big DataData IntegrationElasticsearch

0 likes · 6 min read

How to Sync MySQL Data to Elasticsearch with Logstash: Step‑by‑Step Guide

Architecture Digest

May 13, 2019 · Artificial Intelligence

Enterprise Knowledge Graphs: Development Trends, Use Cases, Database Selection, and Implementation Practices

This article outlines the evolution of knowledge graphs, describes typical enterprise application scenarios, compares graph database options such as Neo4j, Cayley and Dgraph, and presents a six‑step methodology for building, storing, and applying knowledge graphs in large‑scale business environments.

Data IntegrationEnterprise AIKnowledge Graph

0 likes · 13 min read

Enterprise Knowledge Graphs: Development Trends, Use Cases, Database Selection, and Implementation Practices

37 Interactive Technology Team

Mar 28, 2019 · Big Data

Approaches to Building a Basic Data Platform

To handle terabytes of daily data and diverse business needs, the company built a three‑layer basic data platform—collection/computation/storage, unified data management, and API‑driven services—augmented by a standardized collection system, a robust Domino scheduler, and a self‑service analysis tool, aiming to evolve into a full data‑middle‑office for end‑to‑end intelligence.

Data ArchitectureData IntegrationScheduling

0 likes · 8 min read

Approaches to Building a Basic Data Platform

Beike Product & Technology

Feb 21, 2019 · Big Data

DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem

The article presents an in‑depth overview of the DATABUS data integration platform, detailing its background, current challenges, core capabilities such as data syncing, metadata automation, real‑time subscriptions, and its reliance on TiDB, TiSpark, Hudi, and related big‑data technologies to enable near‑real‑time data warehousing.

Big DataData IntegrationHive

0 likes · 13 min read

DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem

360 Quality & Efficiency

Jan 2, 2019 · Big Data

Understanding ETL and Data Warehouses: A Beginner’s Guide

This article introduces the fundamentals of Business Intelligence, explains what ETL and data warehouses are, compares them with traditional databases, and outlines the main characteristics and popular tools such as Hive used in modern big‑data environments.

BIBig DataData Integration

0 likes · 5 min read

Understanding ETL and Data Warehouses: A Beginner’s Guide

360 Tech Engineering

Dec 28, 2018 · Databases

Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine

Quicksql is an open‑source unified SQL query engine that simplifies and secures cross‑data‑source queries by providing a consistent ANSI‑based language, automatic engine selection, and support for mixed queries across Hive, MySQL, Elasticsearch, and other platforms, reducing learning and integration costs.

Data IntegrationSQL EngineUnified Query

0 likes · 6 min read

Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine

Efficient Ops

Dec 24, 2018 · Operations

How Baidu’s Noah Platform Unifies Ops Data with Pull, Push, and Lazy ETL

This article explains how Baidu Cloud's Noah intelligent operations product builds a unified operations knowledge base by categorizing metadata, status, and event data and applying three ETL approaches—Pull, Push, and Lazy—to handle offline, near‑line, and real‑time data integration.

Cloud ComputingData IntegrationETL

0 likes · 8 min read

How Baidu’s Noah Platform Unifies Ops Data with Pull, Push, and Lazy ETL

Youzan Coder

Aug 31, 2018 · Big Data

Evolution of Youzan Search Platform Architecture: From 1.0 to 4.0

The Youzan Search Platform evolved from a simple Elasticsearch cluster in 2015 to a modular, message‑driven architecture with proxy validation, caching, and management tools, and now plans a cloud‑native, Kubernetes‑based 4.0 version that automates data sync, isolates workloads, and scales elastically to support billions of records.

Data IntegrationElasticsearchSearch Architecture

0 likes · 14 min read

Evolution of Youzan Search Platform Architecture: From 1.0 to 4.0

Zhongtong Tech

Aug 31, 2018 · Databases

How Aries Uses MySQL GTID Binlog to Power Real‑Time Data Sync at Scale

Aries, an internally built MySQL incremental log distribution platform, leverages GTID‑based binlog dumping to achieve stable, consistent, and real‑time data synchronization across heterogeneous systems, supporting use cases such as Elasticsearch sync, cache updates, archiving, and live statistics.

Data IntegrationGTIDReal-time Sync

0 likes · 7 min read

How Aries Uses MySQL GTID Binlog to Power Real‑Time Data Sync at Scale

dbaplus Community

Aug 8, 2018 · Big Data

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

Big Data ArchitectureData IntegrationRTDP

0 likes · 24 min read

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

58 Tech

Jun 27, 2018 · Big Data

Overview of the 58 User Profile System Architecture and Data Processing

The article describes the design, data integration, ID mapping, tag generation, and application scenarios of the 58 user profiling platform, which aggregates billions of user IDs across multiple business lines to provide online and offline persona data for personalization, analytics, and AI modeling.

Big DataData ArchitectureData Integration

0 likes · 12 min read

ITPUB

Nov 23, 2017 · Big Data

7 Typical Big Data Projects Every Hadoop Engineer Should Know

The article outlines seven common big‑data initiatives—data integration, specialized analytics, Hadoop‑as‑a‑service, stream processing, complex event handling, ETL pipelines, and SAS replacement—explaining their goals, typical technologies such as HDFS, Hive, Spark, Storm, Kafka, and practical considerations for enterprises adopting Hadoop ecosystems.

Data IntegrationHadoopproject types

0 likes · 8 min read

7 Typical Big Data Projects Every Hadoop Engineer Should Know

Efficient Ops

Sep 25, 2017 · Operations

How Qunar Scaled Application Ops Automation from Hundreds to Tens of Thousands of Servers

This article details Qunar's journey of automating application operations, covering the evolution of their host‑management system, unified monitoring/alert platform, and data‑interchange mechanisms that enabled the company to grow from a few hundred to over ten thousand servers with a stable six‑person ops team.

Data IntegrationMonitoringOperations Automation

0 likes · 25 min read

How Qunar Scaled Application Ops Automation from Hundreds to Tens of Thousands of Servers

StarRing Big Data Open Lab

Jul 28, 2017 · Big Data

How Transwarp Transporter Enables Near‑Real‑Time ETL in Big Data Pipelines

The article introduces Transwarp Transporter, a near‑real‑time ETL tool for TDH 5.x, explains its architecture, visual dashboard, drag‑and‑drop data‑flow design, debugging features, parameter management, and highlights how it empowers business users to achieve fast, reliable data migration in big‑data environments.

Data IntegrationETLTranswarp

0 likes · 7 min read

How Transwarp Transporter Enables Near‑Real‑Time ETL in Big Data Pipelines

Architecture Digest

Jul 22, 2017 · Big Data

Popular Big Data Tools and Their Descriptions

This article provides an extensive overview of more than ninety open‑source and commercial big‑data tools—including ETL platforms, resource managers, storage systems, messaging queues, processing engines, and visualization libraries—detailing their core functions, typical use cases, and notable adopters.

AnalyticsBig DataData Integration

0 likes · 26 min read

Popular Big Data Tools and Their Descriptions

Alibaba Cloud Developer

Mar 7, 2017 · Big Data

Unified Data Platforms: How UMENG+ Redefines Big Data Strategy

The article explores the evolution of big‑data applications in China, from Oracle’s trend report and the concept of "omni‑domain data" to UMENG+’s technical architecture, unified tech stack, AI integration, and future directions for delivering real customer value.

Big DataData IntegrationTechnology Architecture

0 likes · 12 min read

Unified Data Platforms: How UMENG+ Redefines Big Data Strategy

Art of Distributed System Architecture Design

Mar 30, 2016 · Big Data

The Growing Role of Apache Kafka in Modern Big Data Architectures

The article explains how Apache Kafka has become a pivotal, high‑scalable publish‑subscribe system in the big‑data ecosystem, addressing the limitations of traditional databases, enabling real‑time data integration across specialized distributed systems, and shaping future data‑governance practices.

Apache KafkaData IntegrationStreaming

0 likes · 7 min read

The Growing Role of Apache Kafka in Modern Big Data Architectures

Qunar Tech Salon

Jul 8, 2015 · Big Data

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

This article explains how logs—simple, append‑only, time‑ordered records—serve as the core abstraction behind databases, distributed systems, data integration pipelines, and modern stream‑processing platforms such as Kafka and Hadoop, illustrating their design, scalability, and practical challenges.

Big DataData IntegrationHadoop

0 likes · 45 min read

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

Architect

Jul 6, 2015 · Big Data

Understanding Logs: The Core of Distributed Systems and Data Integration

This article explains how logs—simple, append‑only, time‑ordered records—serve as the fundamental abstraction behind databases, distributed systems, data integration pipelines, and stream‑processing platforms like Kafka and Hadoop, illustrating their role in ordering, replication, scalability, and real‑time analytics.

Data IntegrationHadoopdistributed systems

0 likes · 48 min read

Understanding Logs: The Core of Distributed Systems and Data Integration