Tagged articles

3675 articles

Page 31 of 37

Apr 1, 2019 · Big Data

Comprehensive Overview of Hadoop: Core Modules, HDFS Architecture, MapReduce, YARN, and a Scala WordCount Example

This article provides a detailed introduction to Hadoop's ecosystem—including its core modules (Common, HDFS, YARN, MapReduce), the design of a high‑availability HDFS cluster, the principles of distributed file systems, and a complete Scala WordCount MapReduce program—offering a solid foundation for big‑data practitioners.

Big DataHDFSHadoop

0 likes · 15 min read

Comprehensive Overview of Hadoop: Core Modules, HDFS Architecture, MapReduce, YARN, and a Scala WordCount Example

Big Data Technology & Architecture

Mar 29, 2019 · Big Data

Weekly Knowledge Digest: Apache Flink Deep Dives on JOIN LATERAL, TimeInterval, Temporal Table, and State Management

This week's digest shares a personal anecdote and a series of technical deep‑dives into Apache Flink, covering JOIN LATERAL, TimeInterval JOIN, Temporal Table JOIN, state management, and related code examples, while also previewing upcoming work schedules and recommended Flink reference articles.

Apache FlinkBig DataSQL Join

0 likes · 5 min read

Weekly Knowledge Digest: Apache Flink Deep Dives on JOIN LATERAL, TimeInterval, Temporal Table, and State Management

dbaplus Community

Mar 27, 2019 · Big Data

How eBay Cut Hadoop Job Runtime by 60%: Real‑World CAL Log Optimization

This article explains how eBay's CAL team reduced Hadoop MapReduce job execution time and resource consumption by over 60% through targeted GC tuning, data‑skew mitigation, and algorithmic improvements, boosting job success rates to nearly 100% while handling petabyte‑scale log data.

Big DataData SkewGC tuning

0 likes · 12 min read

How eBay Cut Hadoop Job Runtime by 60%: Real‑World CAL Log Optimization

Tencent Cloud Developer

Mar 27, 2019 · Industry Insights

How AI and Big Data Drive New Engineering Education: Insights from the 2019 IT Alliance Conference

The 2019 Information Technology New Engineering Alliance conference in Beijing gathered academia, research institutes, and industry leaders to discuss AI, big data, and curriculum innovation, highlighting Tencent's contributions to digital education, cloud certification, and the broader push for industry‑university collaboration in shaping future IT talent.

AIBig DataCloud Computing

0 likes · 6 min read

How AI and Big Data Drive New Engineering Education: Insights from the 2019 IT Alliance Conference

NetEase Game Operations Platform

Mar 27, 2019 · Big Data

Embedding Python in Java with Jython for Real‑Time Big Data Jobs

This article explains why and how to embed Python code in Java using Jython for real‑time big‑data processing, covering performance benefits, memory‑leak pitfalls, singleton interpreter patterns, function factories, Java‑object conversion, and importing external PyPI packages with practical code examples.

Big DataDynamic LanguageEmbedding

0 likes · 11 min read

Embedding Python in Java with Jython for Real‑Time Big Data Jobs

Big Data Technology & Architecture

Mar 25, 2019 · Big Data

Understanding Apache Flink Interval Join: Syntax, Semantics, and Implementation

This article explains how Apache Flink's Interval Join solves time‑bounded join requirements more efficiently than unbounded joins, covering its syntax, semantics, state‑management considerations, and providing a complete Scala example with code and execution results.

Apache FlinkBig DataInterval Join

0 likes · 11 min read

Understanding Apache Flink Interval Join: Syntax, Semantics, and Implementation

Big Data Technology & Architecture

Mar 22, 2019 · Big Data

Weekly Knowledge Points: Apache Flink Continuous Queries, Kafka Connectors, SQL Overview, JOIN Operator, and Table API

This weekly briefing introduces Apache Flink's continuous query mechanism, demonstrates how to integrate Kafka as a DataStream connector, provides an overview of Flink SQL features, explains the implementation and optimization of dual‑stream JOIN operators, and showcases the Table API with end‑to‑end examples.

Apache FlinkBig DataTable API

0 likes · 3 min read

Weekly Knowledge Points: Apache Flink Continuous Queries, Kafka Connectors, SQL Overview, JOIN Operator, and Table API

Big Data Technology & Architecture

Mar 21, 2019 · Big Data

Apache Flink Table API Tutorial and End‑to‑End Examples

This article provides a comprehensive tutorial on Apache Flink's Table API, explaining its concepts, core features, and a wide range of operators such as SELECT, WHERE, GROUP BY, UNION, JOIN, and various window functions, while offering complete Scala code examples, custom sources, sinks, and an end‑to‑end job that computes page‑view counts per region using event‑time tumbling windows.

Big DataFlinkScala

0 likes · 36 min read

Apache Flink Table API Tutorial and End‑to‑End Examples

Architects' Tech Alliance

Mar 21, 2019 · Cloud Computing

Understanding the Chinese Enterprise IT Landscape: Market Structure, Demand Drivers, and Technology Trends

This article analyzes China's massive enterprise ecosystem, the composition of its IT market, the human and political factors shaping demand, and how cloud computing, big data, and artificial intelligence are driving a new wave of digital transformation across state‑owned, internet, and other enterprises.

Artificial IntelligenceBig DataChina

0 likes · 14 min read

Understanding the Chinese Enterprise IT Landscape: Market Structure, Demand Drivers, and Technology Trends

Xianyu Technology

Mar 21, 2019 · Big Data

Design and Implementation of the Mahé Real-Time Product Selection System Using Blink Stream Computing

Mahé, Xianyu’s real‑time product selection platform, uses Alibaba’s Blink stream engine to merge, evaluate roughly 300 rule‑based filters per item and emit only changed results, processing 1.4 billion daily messages at up to 50 k TPS through a four‑layer, stateful architecture.

Big DataFlinkStateful Computation

0 likes · 15 min read

Design and Implementation of the Mahé Real-Time Product Selection System Using Blink Stream Computing

Tencent Cloud Developer

Mar 20, 2019 · Big Data

TVP Training Camp: Exploring Big Data Technologies and Trends

The inaugural TVP Training Camp on March 16 2019 in Beijing gathered Tencent Cloud’s TVP members and leading big‑data experts to discuss emerging technologies such as Greenplum, PMEM‑driven infrastructure, data‑operation optimization, and next‑generation cloud databases, while a round‑table addressed practical challenges and affirmed Tencent’s commitment to ongoing expert collaboration.

Big DataCloud ComputingData Analytics

0 likes · 11 min read

TVP Training Camp: Exploring Big Data Technologies and Trends

Youzan Coder

Mar 20, 2019 · Big Data

Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions

Youzan’s real‑time computing platform progressed from early Storm deployments through Spark Streaming to a Flink‑based architecture, adding unified task management, monitoring, and dedicated streaming clusters, while now pursuing SQL‑driven jobs, a Druid OLAP engine, and a future real‑time data warehouse.

Big DataFlinkSpark Streaming

0 likes · 14 min read

Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions

Big Data Technology & Architecture

Mar 19, 2019 · Big Data

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

This article provides an in-depth introduction to SQL, its history and ANSI standards, then details Apache Flink's SQL capabilities—including SELECT, WHERE, GROUP BY, UNION, JOIN, window functions, and user-defined functions—accompanied by extensive code examples and a complete end‑to‑end Flink job implementation.

Apache FlinkBig DataStreaming

0 likes · 34 min read

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

Architects' Tech Alliance

Mar 18, 2019 · Big Data

Understanding HDFS Architecture, NameNode HA, and Read/Write Processes

This article explains the concepts and architecture of HDFS, the high‑availability mechanisms of NameNode including quorum‑based shared storage, the detailed read and write workflows of the distributed file system, and discusses its typical use cases and limitations.

Big DataHAHDFS

0 likes · 16 min read

Understanding HDFS Architecture, NameNode HA, and Read/Write Processes

Big Data Technology & Architecture

Mar 17, 2019 · Big Data

Understanding Continuous Queries in Apache Flink: From Static Queries to Dynamic Tables and Trigger Simulations

This article explains how Apache Flink implements continuous queries for unbounded stream processing, compares static and continuous query semantics, demonstrates how MySQL triggers can simulate continuous queries in append‑only and update scenarios, and discusses Flink's connector, source, sink, and retraction mechanisms for correct incremental computation.

Apache FlinkBig DataContinuous Query

0 likes · 18 min read

Understanding Continuous Queries in Apache Flink: From Static Queries to Dynamic Tables and Trigger Simulations

dbaplus Community

Mar 14, 2019 · Operations

How Top Internet Companies Scale Spark CI/CD Across Tens of Thousands of Nodes

This article details a practical, production‑grade Spark CI/CD workflow using GitLab and Jenkins, covering source management, multi‑branch release strategies, automated testing, gray‑release, hot‑fix handling, and rollback mechanisms for large‑scale deployments.

Big DataCI/CDContinuous Delivery

0 likes · 17 min read

How Top Internet Companies Scale Spark CI/CD Across Tens of Thousands of Nodes

Big Data Technology & Architecture

Mar 13, 2019 · Big Data

Understanding Fault Tolerance and Exactly-Once Semantics in Apache Flink

This article explains Apache Flink's fault‑tolerance mechanisms, including checkpointing, barrier alignment, the differences between At‑Least‑Once and Exactly‑Once semantics, configuration options, incremental checkpointing, and the requirements for external sources and sinks to achieve end‑to‑end exactly‑once processing.

Apache FlinkBig DataExactly-Once

0 likes · 15 min read

Understanding Fault Tolerance and Exactly-Once Semantics in Apache Flink

JD Tech

Mar 13, 2019 · Operations

Evolution of JD Digital Technology’s Host Monitoring System “DiTing”: From V1 to V3

The article chronicles the design, evolution, and lessons learned of JD Digital Technology’s self‑built host monitoring platform “DiTing”, detailing its initial requirements, V1 architecture, subsequent V2 and V3 redesigns, encountered challenges, and future directions toward intelligent operations.

Big DataOperationsSystem Architecture

0 likes · 12 min read

Evolution of JD Digital Technology’s Host Monitoring System “DiTing”: From V1 to V3

dbaplus Community

Mar 12, 2019 · Databases

Mastering HBase Cross‑Datacenter Migration: Snapshots, Architecture, and Real‑World Tips

This article provides a comprehensive technical guide on HBase, covering its core concepts, advantages and drawbacks, architecture layers, practical use cases, and a detailed step‑by‑step process for large‑scale cross‑datacenter migration using snapshot‑based strategies, with commands, diagrams, and lessons learned.

Big DataData MigrationDatabase Architecture

0 likes · 19 min read

Mastering HBase Cross‑Datacenter Migration: Snapshots, Architecture, and Real‑World Tips

DataFunTalk

Mar 11, 2019 · Artificial Intelligence

Practical Implementation of Personalized Recommendation Systems: Overview, Algorithms, Challenges, and Architecture

This article presents a comprehensive overview of personalized recommendation systems, covering their purpose, common algorithms, development challenges, the multi‑layer architecture used at DataGrand, optimization techniques, and the range of services offered to enterprise customers.

Big Datacollaborative filteringmachine learning

0 likes · 18 min read

Practical Implementation of Personalized Recommendation Systems: Overview, Algorithms, Challenges, and Architecture

JD Tech Talk

Mar 11, 2019 · Operations

Evolution of JD Digital Technology’s Host Monitoring System “Diting”: Architecture from V1 to V3

The article chronicles the design, implementation, and iterative evolution of JD Digital Technology’s in‑house host monitoring platform Diting, detailing its V1, V2, and V3 architectures, the challenges encountered at each stage, and future directions toward intelligent, automated operations.

AlertingArchitectureBig Data

0 likes · 14 min read

Evolution of JD Digital Technology’s Host Monitoring System “Diting”: Architecture from V1 to V3

DataFunTalk

Mar 7, 2019 · Big Data

Design and Evolution of Didi's Real‑Time Data Computing Platform

The article details how Didi built and iterated its real‑time data platform, describing the shift from MySQL‑based batch processing to a Kafka‑Samza‑Druid architecture with Spark Streaming and Flink, the challenges addressed, and the current capabilities and operational metrics.

Big DataDruidFlink

0 likes · 9 min read

Design and Evolution of Didi's Real‑Time Data Computing Platform

58 Tech

Mar 7, 2019 · Big Data

In-Memory Inverted Index Compression Algorithms: Overview and MILC Optimization for High‑Performance Search

This article reviews major in‑memory inverted index compression techniques such as PForDelta, PEF, and MILC, explains their principles and trade‑offs, and details practical optimizations applied at 58.com to achieve query performance comparable to uncompressed indexes while reducing memory usage by about 35 percent.

Big DataMILCalgorithm

0 likes · 17 min read

In-Memory Inverted Index Compression Algorithms: Overview and MILC Optimization for High‑Performance Search

Big Data Technology & Architecture

Mar 6, 2019 · Big Data

Using Flink Redis Sink for Streaming WordCount from Kafka to Redis

This tutorial demonstrates how to integrate Apache Flink with Redis as a sink, showing the Maven dependency, a custom RedisMapper implementation, and a complete Flink job that reads Kafka messages, performs word count, and stores results in Redis, with plans for HBase and MySQL extensions.

Big DataFlinkStreaming

0 likes · 4 min read

Using Flink Redis Sink for Streaming WordCount from Kafka to Redis

AntTech

Mar 6, 2019 · Databases

How Ant Financial Scaled the 2019 Alipay New Year Red Envelope Event with GeaBase Graph Database and Real‑Time Data Intelligence

The 2019 Alipay New Year "Five Blessings" red‑envelope campaign, serving 450 million users, leveraged Ant Financial's GeaBase distributed graph database, a real‑time data‑intelligence platform, and OceanBase elastic resources to achieve millisecond‑level ranking, seconds‑level transaction audit, and seamless high‑concurrency performance.

AlipayBackendBig Data

0 likes · 10 min read

How Ant Financial Scaled the 2019 Alipay New Year Red Envelope Event with GeaBase Graph Database and Real‑Time Data Intelligence

Big Data Technology & Architecture

Mar 4, 2019 · Big Data

Apache Flink Table API and SQL Tutorial with Code Examples

This article introduces Apache Flink’s Table API and SQL, explains the TableEnvironment programming model, shows how to register tables and sinks, and provides two complete Java examples—WordCount and a file‑based aggregation—complete with code that can be downloaded for local testing.

Big DataDataStreamFlink

0 likes · 7 min read

Apache Flink Table API and SQL Tutorial with Code Examples

Big Data Technology & Architecture

Mar 3, 2019 · Big Data

Getting Started with Flink Kafka Connector: Concepts, Setup, and Sample Code

This article introduces the Flink‑Kafka connector, explains essential Kafka concepts, shows how to configure checkpointing, provides Maven dependencies, and includes complete Java examples for both producing to and consuming from Kafka within a Flink streaming job.

Big DataConnectorFlink

0 likes · 8 min read

Getting Started with Flink Kafka Connector: Concepts, Setup, and Sample Code

Big Data Technology & Architecture

Mar 2, 2019 · Big Data

Understanding and Using Broadcast Variables in Apache Flink

This article explains the concept, usage, precautions, and a practical example of broadcast variables in Apache Flink, illustrating how to initialize, broadcast, retrieve, and apply shared data across parallel operators with Java code snippets.

Big DataBroadcast VariableFlink

0 likes · 4 min read

Understanding and Using Broadcast Variables in Apache Flink

Big Data Technology & Architecture

Mar 1, 2019 · Big Data

Understanding Watermarks in Apache Flink for Handling Out-of-Order Events

This article explains how Apache Flink uses Watermarks to manage event‑time windows, describes the three time semantics, details periodic and punctuated Watermark generation methods with their Java interfaces, and shows practical DDL examples for handling late and out‑of‑order data in stream processing.

Apache FlinkBig DataEventTime

0 likes · 11 min read

Understanding Watermarks in Apache Flink for Handling Out-of-Order Events

DataFunTalk

Mar 1, 2019 · Big Data

Renrenche Mobile Data Platform: Architecture, Real‑Time Computing, and BI Solutions

The article presents Renrenche’s end‑to‑end mobile data platform, detailing its overall architecture, real‑time Spark‑based computation engine, Web IDE, metadata management, BI reporting built on ClickHouse, and how data‑driven practices empower both online and offline business operations.

BI reportingBig DataClickHouse

0 likes · 15 min read

Renrenche Mobile Data Platform: Architecture, Real‑Time Computing, and BI Solutions

Big Data Technology & Architecture

Feb 28, 2019 · Big Data

Understanding Time Semantics in Apache Flink: Processing Time, Event Time, and Ingestion Time

This article introduces Apache Flink's three time semantics—Processing Time, Event Time, and Ingestion Time—explaining their definitions, differences, and practical implications for windowing and stream processing, while also providing links to introductory Flink tutorials.

Big DataEvent TimeFlink

0 likes · 7 min read

Understanding Time Semantics in Apache Flink: Processing Time, Event Time, and Ingestion Time

Big Data Technology & Architecture

Feb 28, 2019 · Big Data

Understanding Flink Window Types and Their Implementations

This article explains Flink's window concepts—including time‑based, count‑based, tumbling, sliding, and session windows—provides practical Scala code examples for each type, and links to related resources on Flink basics, APIs, deployment, and advanced features.

Big DataFlinkScala

0 likes · 5 min read

Understanding Flink Window Types and Their Implementations

HomeTech

Feb 28, 2019 · Artificial Intelligence

How to Systematically Test and Monitor AI Models in Large‑Scale Production

This article presents a comprehensive approach to testing, automating, and monitoring AI prediction models in a high‑traffic environment, covering background, challenges, evaluation metrics, data sampling methods, automated test scripts, and online monitoring to ensure model accuracy, performance, and reliability.

AI testingBig DataMetrics

0 likes · 13 min read

How to Systematically Test and Monitor AI Models in Large‑Scale Production

Xianyu Technology

Feb 28, 2019 · Big Data

NVID Recommendation System Architecture and Technical Solutions

The NVID recommendation system for Taobao is built on a four‑layer architecture—activity material, configuration, business process, and application—and solves environment isolation, performance, audience management, and A/B testing challenges through optimized data schemas, ID mapping, multi‑level caching with database fallback, and real‑time user targeting, while future work aims at personalized audiences and automated ad optimization.

A/B testingBig DataSystem Architecture

0 likes · 11 min read

NVID Recommendation System Architecture and Technical Solutions

Big Data Technology & Architecture

Feb 27, 2019 · Big Data

Understanding Flink Restart Strategies: Configuration and Code Examples

This article explains Flink's restart strategies—including fixed‑delay, failure‑rate, and no‑restart—how to configure them globally via flink‑conf.yaml or programmatically in code, and provides complete Java examples demonstrating each approach.

Big DataFlinkRestart Strategy

0 likes · 4 min read

Understanding Flink Restart Strategies: Configuration and Code Examples

Big Data Technology & Architecture

Feb 27, 2019 · Big Data

Using Flink Distributed Cache: Overview and Example

This article explains Flink's distributed cache feature, describes its registration and retrieval mechanisms, and provides a complete Java example that demonstrates how to register a file, access it within a RichMapFunction, and print the processed results.

Big DataDataset APIFlink

0 likes · 4 min read

Using Flink Distributed Cache: Overview and Example

AntTech

Feb 27, 2019 · Big Data

Ant Financial Data Governance: Practices and Challenges in Data Quality Management

The article details Ant Financial’s comprehensive data quality governance framework, covering its architecture, challenges, implementation strategies, and real‑world case studies, illustrating how the company integrates data monitoring, AI‑driven self‑healing, and rigorous release controls to ensure high‑quality data across its platform.

Ant FinancialBig DataData Governance

0 likes · 17 min read

Ant Financial Data Governance: Practices and Challenges in Data Quality Management

Qunar Tech Salon

Feb 27, 2019 · Databases

Evolution of Meituan’s Database Platform: From Manual Operations to Intelligent Automation

This article outlines Meituan’s transition of its database platform from manual, script‑based operations through tool‑ and product‑centric stages to a private‑cloud and automation era, discusses current challenges such as root‑cause analysis and staffing, and shares insights on moving toward fully intelligent, data‑driven database operations.

Big DataCloud ComputingIntelligent Operations

0 likes · 13 min read

Evolution of Meituan’s Database Platform: From Manual Operations to Intelligent Automation

Big Data Technology & Architecture

Feb 26, 2019 · Big Data

Deploying Apache Flink Clusters: Standalone and YARN Modes

This guide explains how to set up an Apache Flink cluster on CentOS 7 using three deployment methods—Local, Standalone, and Flink on YARN/Kubernetes—including host configuration, SSH setup, package distribution, configuration file editing, cluster start/stop commands, YARN resource manager concepts, session commands, job submission, fault‑tolerance settings, and log inspection.

Big DataCluster DeploymentFlink

0 likes · 11 min read

Deploying Apache Flink Clusters: Standalone and YARN Modes

Big Data Technology & Architecture

Feb 25, 2019 · Big Data

Understanding Flink DataSetAPI and DataStreamAPI

This article introduces Apache Flink's DataSetAPI and DataStreamAPI, explains their source, transformation, and sink concepts, highlights the key differences in transformation handling, and notes the series' goal of publishing over 500 big‑data tutorials for learners from beginner to expert.

Big DataDataSetAPIDataStreamAPI

0 likes · 2 min read

Understanding Flink DataSetAPI and DataStreamAPI

Efficient Ops

Feb 24, 2019 · Databases

Why Row vs Column Storage Matters: Understanding HBase’s Column‑Family Model

This article explains the differences between row‑oriented and column‑oriented storage, compares their trade‑offs, and introduces HBase’s column‑family architecture, including row keys, column qualifiers, timestamps, cells, and how it maps to a multi‑dimensional map structure.

Big DataColumnar StorageHBase

0 likes · 7 min read

Why Row vs Column Storage Matters: Understanding HBase’s Column‑Family Model

Vipshop Quality Engineering

Feb 22, 2019 · Artificial Intelligence

How Vipshop Built an AI‑Powered Sentiment Analysis System for Real‑Time Customer Feedback

Vipshop's in‑house sentiment monitoring platform integrates web‑scraped reviews, WeChat comments and internal service messages, applying lexical sentiment scoring, dictionary‑based Chinese word segmentation, TF‑IDF keyword ranking and lightweight classification to deliver real‑time insights, alerts and actionable reports for thousands of daily user comments.

Big DataNLPSentiment Analysis

0 likes · 17 min read

How Vipshop Built an AI‑Powered Sentiment Analysis System for Real‑Time Customer Feedback

Beike Product & Technology

Feb 21, 2019 · Big Data

DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem

The article presents an in‑depth overview of the DATABUS data integration platform, detailing its background, current challenges, core capabilities such as data syncing, metadata automation, real‑time subscriptions, and its reliance on TiDB, TiSpark, Hudi, and related big‑data technologies to enable near‑real‑time data warehousing.

Big DataData IntegrationHudi

0 likes · 13 min read

DATABUS Data Integration Platform: Architecture, Capabilities, and TiDB Ecosystem

Big Data Technology & Architecture

Feb 20, 2019 · Big Data

Zookeeper: The Core Coordination Service in Big Data Systems

Zookeeper, originally a side‑project of Hadoop, is a Yahoo‑developed distributed coordination framework that provides high‑availability services such as configuration management, distributed locks, and failure handling, and has become a foundational component for many big‑data systems like Hadoop, Kafka, and Dubbo.

Big DataConfiguration ManagementCoordination Service

0 likes · 3 min read

Zookeeper: The Core Coordination Service in Big Data Systems

dbaplus Community

Feb 19, 2019 · Big Data

Mastering HDFS Monitoring on JD Cloud: Key Metrics, Tools, and Best Practices

This article presents a comprehensive guide to monitoring Hadoop Distributed File System (HDFS) on JD Cloud, covering challenges, recommended toolchains, essential metrics, configuration tips, and real‑world case studies to help engineers ensure reliability and performance of large‑scale data clusters.

Big DataELKHDFS

0 likes · 14 min read

Mastering HDFS Monitoring on JD Cloud: Key Metrics, Tools, and Best Practices

Big Data Technology & Architecture

Feb 18, 2019 · Big Data

Big Data Mastery Series – Distributed Theory Foundations and Principles

This article introduces the foundational concepts and principles of distributed systems—including basic concepts, consistency models, CAP theorem, logical clocks, and advanced protocols like Paxos, Raft, and Zab—serving as the first part of a comprehensive Big Data mastery series.

Big DataCAP theoremConsistency

0 likes · 4 min read

Big Data Mastery Series – Distributed Theory Foundations and Principles

Tencent Cloud Developer

Feb 14, 2019 · Industry Insights

Turning IoT Data into Fully Automated Smart Parks: Key Stages & Architecture

The article outlines how rapid urban growth drives smart park initiatives that leverage IoT, big‑data analytics, digital twins, and full‑process visualization to evolve from efficient management to ecosystem integration and ultimately to fully automated, self‑governing urban micro‑environments.

Big DataDigital TwinIndustry Insights

0 likes · 11 min read

Turning IoT Data into Fully Automated Smart Parks: Key Stages & Architecture

Sohu Tech Products

Feb 13, 2019 · Big Data

Evolution and Implementation Details of Spark Shuffle Mechanisms

This article examines the historical evolution of Spark's shuffle implementations—from early Hash‑Based Shuffle to modern SortShuffleWriter, BypassMergeSortShuffleWriter, and UnsafeShuffleWriter—explaining their design choices, selection criteria, and the corresponding shuffle reader architecture in a production‑grade Spark 2.1.1 environment.

Big DataShuffleShuffle Writer

0 likes · 13 min read

Evolution and Implementation Details of Spark Shuffle Mechanisms

dbaplus Community

Feb 13, 2019 · Big Data

How Zhihu Scaled Its Real-Time Analytics with Druid and Smart Redis Caching

Zhihu built a self‑service analytics platform on Druid, introduced a multi‑level Redis caching strategy, split long‑duration queries across multiple brokers, and added automatic cache invalidation to dramatically improve query latency and resource usage for massive daily request volumes.

AnalyticsBig DataDruid

0 likes · 13 min read

How Zhihu Scaled Its Real-Time Analytics with Druid and Smart Redis Caching

Ctrip Technology

Feb 13, 2019 · R&D Management

Ctrip’s Technology Evolution: From Call‑Center Era to Big Data and AI

The article outlines Ctrip’s three‑phase technology evolution—from a simple call‑center architecture to layered internet and mobile platforms, and finally to a cloud‑based big‑data and AI‑driven ecosystem—highlighting architectural changes, operational challenges, and strategic lessons for fast‑growing internet companies.

Big DataCtripR&D management

0 likes · 13 min read

Ctrip’s Technology Evolution: From Call‑Center Era to Big Data and AI

Youzan Coder

Feb 1, 2019 · Big Data

Design and Implementation of Log Parsing for a Big Data Offline Task Platform

The article describes a log‑parsing feature for Youzan’s big‑data offline platform that captures runtime logs from Hive, Spark, DataX, MapReduce and HBase jobs, categorizes scheduling types, extracts metrics such as read/write bytes, shuffle volume and GC time, and processes them in real time via a Filebeat‑Logstash‑Kafka‑Spark‑Streaming pipeline storing results in Redis for monitoring, optimization and resource‑usage ranking.

Big DataResource MonitoringYARN

0 likes · 7 min read

Design and Implementation of Log Parsing for a Big Data Offline Task Platform

Didi Tech

Jan 31, 2019 · Big Data

Router-Based Federation in Hadoop: Architecture, Components, and Didi’s Deployment

Router‑Based Federation replaces Hadoop’s single‑point HDFS bottleneck with a server‑side global namespace managed by Routers and a State Store, enabling scalable, highly available sub‑clusters; Didi back‑ported the feature, deployed five Routers, fixed numerous bugs, and contributed patches to improve stability and functionality.

Big DataHDFSHadoop

0 likes · 11 min read

Router-Based Federation in Hadoop: Architecture, Components, and Didi’s Deployment

DataFunTalk

Jan 30, 2019 · Artificial Intelligence

Real‑Time Metrics Processing Technology for Financial Risk Control and Anti‑Fraud

This article outlines the challenges of financial risk control in the internet era and presents a comprehensive real‑time metrics processing system, covering data leakage, fraud, big‑data opportunities, AI model deployment, and the technical architecture of the Bangsheng real‑time indicator platform.

AIBig Dataanti‑fraud

0 likes · 17 min read

Real‑Time Metrics Processing Technology for Financial Risk Control and Anti‑Fraud

ITFLY8 Architecture Home

Jan 29, 2019 · Operations

How to Optimize Large-Scale Log Systems for Real-Time Monitoring and Scalability

This article examines the design, deployment, and optimization of massive log systems, comparing architectures, discussing real‑time versus near‑real‑time requirements, and presenting practical improvements such as memory, CPU, network tuning, data partitioning, storage reduction, and component upgrades using ELK, Kafka, Fluentd, and HBase.

Big DataELKFluentd

0 likes · 18 min read

How to Optimize Large-Scale Log Systems for Real-Time Monitoring and Scalability

Alibaba Cloud Developer

Jan 28, 2019 · Big Data

How Alibaba’s Blink Supercharges Flink for Massive Stream and Batch Processing

Alibaba’s Blink, an internal enhancement of Apache Flink, is now open‑sourced, bringing advanced runtime, SQL/TableAPI, Hive compatibility, Zeppelin integration, and a revamped Flink Web UI to dramatically boost performance and scalability for both streaming and batch workloads.

Batch ProcessingBig DataFlink

0 likes · 16 min read

How Alibaba’s Blink Supercharges Flink for Massive Stream and Batch Processing

21CTO

Jan 26, 2019 · Big Data

Data Lake vs Data Warehouse: Which One Powers Your Business?

This article explains the core differences between data lakes and data warehouses, their respective strengths, and how they complement each other to support both exploratory analytics and routine business reporting.

AnalyticsBig DataData Lake

0 likes · 5 min read

Data Lake vs Data Warehouse: Which One Powers Your Business?

NetEase Game Operations Platform

Jan 25, 2019 · Big Data

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

This article analyzes the difficulties of achieving exactly-once delivery in Apache Flink, explains the distinction between state and end‑to‑end semantics, and details how idempotent and transactional sinks—illustrated with the Bucketing File Sink—realize exactly‑once guarantees through checkpoint‑based two‑phase commit.

Big DataExactly-OnceFlink

0 likes · 13 min read

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

dbaplus Community

Jan 23, 2019 · Big Data

How Zhihu Built a Scalable Data‑Sync Platform with Sqoop and DataX

This article explains Zhihu's journey from ad‑hoc MySQL‑Hive sync using Oozie + Sqoop to a unified, platform‑based data synchronization service that now handles thousands of tables, over 10 TB daily, with load‑aware scheduling, incremental pulls, schema change handling, and tight integration with their offline job scheduler.

Big DataDataXETL

0 likes · 14 min read

How Zhihu Built a Scalable Data‑Sync Platform with Sqoop and DataX

21CTO

Jan 23, 2019 · Big Data

Can 1.4 Billion Users Fit Into One WeChat Group? A Technical Feasibility Study

This article analyzes whether the entire Chinese population could be added to a single WeChat group, examining user statistics, message volume, required bandwidth, CPU processing limits, Moore's law projections, supercomputer alternatives, hardware costs, storage demands, and practical challenges, concluding that it is theoretically possible but practically infeasible.

Big DataPerformanceServer

0 likes · 10 min read

Can 1.4 Billion Users Fit Into One WeChat Group? A Technical Feasibility Study

MaGe Linux Operations

Jan 23, 2019 · Big Data

How Bloom Filters Power Fast Big Data Searches with Python

This tutorial walks through building a simple Python search engine for big data, covering Bloom filter basics, tokenization with major and minor segmentation, inverted index creation, and implementing both simple and complex (AND/OR) queries, complete with code examples and visual illustrations.

AND/OR queriesBig DataPython

0 likes · 15 min read

How Bloom Filters Power Fast Big Data Searches with Python

Tencent Cloud Developer

Jan 17, 2019 · Artificial Intelligence

Deep Learning for Big Data Recommendation Systems: Tencent's Industrial Practice

Tencent’s industrial practice shows how a large‑scale offline‑nearline‑online “Shield” recommendation architecture, powered by the DeepR framework built on RCaffe, uses deep semantic embeddings, massive neural networks and reinforcement‑learning decisions to handle billions of daily requests, demonstrating that data richness and engineering capability, not model depth alone, drive performance in big‑data recommendation systems.

Big DataDeep LearningNeural Network

0 likes · 13 min read

Deep Learning for Big Data Recommendation Systems: Tencent's Industrial Practice

JD Tech

Jan 17, 2019 · Operations

Technical Overview of JD's Archimedes Resource Scheduling System

The article presents a detailed technical analysis of JD's Archimedes project, describing its evolution from JDOS 2.0 to a large‑scale container scheduling platform that dramatically improves resource utilization, deployment speed, and cost efficiency across JD’s data centers.

AIBig DataJD

0 likes · 6 min read

Technical Overview of JD's Archimedes Resource Scheduling System

Youzan Coder

Jan 16, 2019 · Big Data

How Youzan Scaled Real‑Time Analytics with Flink: Architecture, Pitfalls, and Lessons

This article walks through Youzan's real‑time platform architecture, explains why Flink was chosen over Spark Structured Streaming, details practical challenges such as container over‑provisioning and monitoring overhead, shares solutions for Spring integration and async caching, and outlines future directions for SQL‑based streaming and scheduler improvements.

Big DataFlinkReal-time Streaming

0 likes · 19 min read

How Youzan Scaled Real‑Time Analytics with Flink: Architecture, Pitfalls, and Lessons

StarRing Big Data Open Lab

Jan 16, 2019 · Big Data

What’s New in Transwarp TDH 5.2.3? Key Performance and Stability Enhancements

TDH 5.2.3 introduces a series of stability and performance upgrades—including transaction and compaction optimizations, enhanced error handling, SQL length protection, improved Oracle‑compatible UDFs, default resource pool support, Guardian caching, TxSQL monitoring, and workflow and OLAP engine fixes—aimed at delivering a more reliable big‑data platform.

Big DataPerformancedatabase

0 likes · 10 min read

What’s New in Transwarp TDH 5.2.3? Key Performance and Stability Enhancements

dbaplus Community

Jan 13, 2019 · Databases

January 2019 DB-Engines Newsletter: Latest Database Releases & Key Features

The January 2019 DB-Engines newsletter compiles the newest releases, feature highlights, and performance improvements across RDBMS, NoSQL, NewSQL, time‑series, big‑data, domestic, and cloud database families, while also explaining the ranking methodology and providing download links for the full issue.

Big DataCloud ComputingNewSQL

0 likes · 41 min read

Youzan Coder

Jan 9, 2019 · Big Data

How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive

This article details Youzan's transition from Hive to SparkSQL, covering platform architecture, usability and performance enhancements, migration strategies, automated engine selection, and future plans that together reduced resource consumption by up to 67% while handling thousands of daily jobs.

AvailabilityBig DataData Platform

0 likes · 13 min read

How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive

360 Quality & Efficiency

Jan 4, 2019 · Big Data

Overview of Big Data Processing Engines: MapReduce, Tez, Spark, and Flink

This article reviews the evolution and characteristics of major big‑data processing engines—from first‑generation Hadoop MapReduce to second‑generation DAG‑based Tez, third‑generation in‑memory Spark, and fourth‑generation real‑time Flink—highlighting their batch and streaming use cases.

Big DataFlinkMapReduce

0 likes · 9 min read

Overview of Big Data Processing Engines: MapReduce, Tez, Spark, and Flink

dbaplus Community

Jan 3, 2019 · Backend Development

Supercharging Elasticsearch for Billion-Row Queries: Practical Tips

This guide details how to optimize Elasticsearch for handling billions of daily records, covering core Lucene concepts, index and shard configuration, performance‑tuning parameters, and practical testing methods to achieve sub‑second query responses and long‑term data retention.

Big DataElasticsearchPerformance Optimization

0 likes · 13 min read

Supercharging Elasticsearch for Billion-Row Queries: Practical Tips

Big Data Technology & Architecture

Jan 3, 2019 · Big Data

Deploying Apache Flink on YARN and Running Flink Jobs

This tutorial explains how to deploy Apache Flink on a Hadoop YARN cluster, covering both YARN session mode and direct job submission, and demonstrates running the built‑in WordCount example with command‑line options for input, output, and resource configuration.

Apache FlinkBig DataFlink Deployment

0 likes · 8 min read

Deploying Apache Flink on YARN and Running Flink Jobs

Big Data Technology & Architecture

Jan 3, 2019 · Big Data

Reading Kafka Topics with Flink: A Step‑by‑Step Guide

This tutorial demonstrates how to use Apache Flink's Kafka connector to read data from Kafka topics with exactly‑once semantics, covering Maven dependencies, consumer configuration, checkpointing for fault tolerance, and a complete Scala example that writes the streamed data to HDFS.

Big DataFlinkKafkaConnector

0 likes · 5 min read

Reading Kafka Topics with Flink: A Step‑by‑Step Guide

360 Quality & Efficiency

Jan 2, 2019 · Big Data

Understanding ETL and Data Warehouses: A Beginner’s Guide

This article introduces the fundamentals of Business Intelligence, explains what ETL and data warehouses are, compares them with traditional databases, and outlines the main characteristics and popular tools such as Hive used in modern big‑data environments.

BIBig DataData Integration

0 likes · 5 min read

Understanding ETL and Data Warehouses: A Beginner’s Guide

Big Data Technology & Architecture

Jan 2, 2019 · Big Data

Optimizing Spark Direct Kafka Consumption: Subpartition Concurrency and Repartition Strategies

To address the long processing time caused by uneven Spark partitions when reading Kafka via the Direct approach, this article explains the SPARK‑22056 solution that modifies KafkaRDD.getPartitions to support a configurable 'topic.partition.subconcurrency' parameter, discusses its trade‑offs, and presents alternative repartition and multithreading techniques.

Big DataPartitioningScala

0 likes · 6 min read

Optimizing Spark Direct Kafka Consumption: Subpartition Concurrency and Repartition Strategies

Big Data Technology & Architecture

Jan 2, 2019 · Big Data

Understanding Spark Streaming Backpressure Mechanism

The article explains how Spark Streaming backpressure, introduced in version 1.5, automatically adjusts data ingestion rates based on processing delays, replaces manual rate limits, and details its architecture, configuration parameters, and usage for preventing data backlog and executor OOM.

Big DataRate ControlSpark

0 likes · 6 min read

Understanding Spark Streaming Backpressure Mechanism

Big Data Technology & Architecture

Jan 1, 2019 · Big Data

Insights from the Real-Time Big Data Meetup: Spark Structured Streaming Overview

The meetup on September 8, co‑hosted by InfoQ and Huawei Cloud, featured Databricks engineer Tathagata Das explaining Spark Structured Streaming’s concepts, fault‑tolerance, performance, event‑time handling, and real‑world use cases such as Apple’s security platform, highlighting its scalability and integration with various data sources.

Big DataSparkStructured Streaming

0 likes · 8 min read

Insights from the Real-Time Big Data Meetup: Spark Structured Streaming Overview

Big Data Technology & Architecture

Dec 31, 2018 · Big Data

Overview of the Big Data Ecosystem and Core Technologies

This article provides a comprehensive overview of the big data ecosystem, explaining key components such as Hadoop, HDFS, Spark, Hive, Pig, HBase, and related tools, and describes how they work together to store, process, and analyze massive datasets efficiently.

Big DataHadoopMapReduce

0 likes · 16 min read

Overview of the Big Data Ecosystem and Core Technologies

Architects Research Society

Dec 30, 2018 · Big Data

Overview of Major Apache Big Data Processing Frameworks

This article provides a concise overview of numerous Apache open‑source projects—including Ignite, MapReduce, Pig, JAQL, Spark, Storm, Flink, Apex, REEF, Twill, and Beam—that enable distributed in‑memory storage, real‑time and batch processing, and advanced analytics for large‑scale data workloads.

ApacheBig DataFlink

0 likes · 22 min read

Overview of Major Apache Big Data Processing Frameworks

Youzan Coder

Dec 28, 2018 · Big Data

Quantifying HBase Write Path: Disk and Network Costs for High‑Throughput Scenarios

This article analytically breaks down HBase's write pipeline, quantifies disk and network overheads for massive random writes, derives formulas for resource consumption under realistic assumptions, and offers concrete tuning recommendations to optimize throughput and reduce cost.

Big DataHBasePerformance

0 likes · 16 min read

Quantifying HBase Write Path: Disk and Network Costs for High‑Throughput Scenarios

Tencent Cloud Developer

Dec 28, 2018 · Big Data

Intelligent Operations for Tencent Cloud Big Data Platform: Challenges, Practices, and Future Directions

Tencent Cloud’s big‑data platform tackles massive, multi‑component clusters by deploying an AIOps framework that aggregates logs and metrics, applies statistical and machine‑learning anomaly detection, uses regression and reinforcement‑learning for job‑parameter optimization, and integrates offline‑online pipelines, achieving over 88 % precision while planning automated root‑cause analysis, productized tools, platformized algorithm integration, and cross‑domain model reuse.

Big DataCloud ComputingIntelligent Operations

0 likes · 20 min read

Intelligent Operations for Tencent Cloud Big Data Platform: Challenges, Practices, and Future Directions

Meituan Technology Team

Dec 27, 2018 · Artificial Intelligence

Meituan’s AI Initiatives: Large‑Scale Scheduling, Unmanned Delivery, and the Meituan Brain Knowledge Graph

Meituan’s AI division, now over 1,000 engineers with a 2 billion‑CNY quarterly budget, powers massive real‑time scheduling for 20 million daily orders, unmanned delivery pilots, and the “Meituan Brain” knowledge graph of billions of entities, delivering AI‑driven services across its entire platform.

AIBig DataLarge-Scale Scheduling

0 likes · 16 min read

Meituan’s AI Initiatives: Large‑Scale Scheduling, Unmanned Delivery, and the Meituan Brain Knowledge Graph

Xianyu Technology

Dec 27, 2018 · Big Data

Device Fingerprinting and User Growth Architecture in Alibaba's Xianyu Platform

Alibaba’s Xianyu platform uses a multi‑signal device fingerprinting system, UMID, to uniquely identify users across Android and iOS devices, storing the data in sharded MySQL, HiStore OLAP, and Tair caches, enabling precise ad bidding, conversion tracking, and scalable user‑growth strategies.

Big DataInformation SecuritySystem Architecture

0 likes · 9 min read

Device Fingerprinting and User Growth Architecture in Alibaba's Xianyu Platform

Didi Tech

Dec 26, 2018 · Industry Insights

How Didi Implements Full‑Chain Data Tiered Protection for Reliable Operations

Facing growing data‑driven pressures, Didi designed a full‑link data tiered protection framework that defines classification standards, integrates data levels across the entire pipeline, and applies concrete safeguards and tooling to improve resource allocation, backup reliability, and overall data reliability.

Big DataData GovernanceDidi

0 likes · 7 min read

How Didi Implements Full‑Chain Data Tiered Protection for Reliable Operations

Alibaba Cloud Developer

Dec 20, 2018 · Big Data

Unlocking Alibaba’s Massive Cluster Data V2018: A Treasure Trove for Big‑Data Research

Alibaba has released the comprehensive Cluster Data V2018 dataset, detailing eight days of operation for 4,000 servers and their mixed online and offline workloads, including DAG information, enabling researchers to study large‑scale data‑center performance, resource utilization, scheduling algorithms, and derive new insights.

Big DataDAGDataset

0 likes · 7 min read

Unlocking Alibaba’s Massive Cluster Data V2018: A Treasure Trove for Big‑Data Research

Didi Tech

Dec 18, 2018 · Big Data

Evolution and Architecture of Didi's Real-Time Computing Platform

From early self‑built Storm and Spark Streaming clusters to a unified YARN‑based Spark platform and finally a low‑latency Flink system with extended CEP and StreamSQL capabilities, Didi’s real‑time computing platform evolved through three stages, delivering multi‑tenant isolation, rich SQL processing, and dramatically reduced development costs.

Big DataCEPFlink

0 likes · 9 min read

Evolution and Architecture of Didi's Real-Time Computing Platform

Qunar Tech Salon

Dec 18, 2018 · Big Data

Practical Insights on Deploying and Operating Elasticsearch at Scale

This article shares extensive practical experience from Qunar's large‑scale Elasticsearch deployment, covering suitable use cases, index‑type design, document ID strategies, scaling considerations for index and data volume, hardware sizing, and storage architecture recommendations to help newcomers avoid common pitfalls.

Big DataElasticsearchindexing

0 likes · 10 min read

Practical Insights on Deploying and Operating Elasticsearch at Scale

JD Tech

Dec 17, 2018 · Operations

Improving JD Intelligent Supply Chain Efficiency and System Stability for Major Sales Events

The article details JD's intelligent supply chain enhancements—including machine‑learning demand forecasting, a new "explosive product warehouse" model, non‑stock fulfillment visualization, blockchain‑based product traceability, and comprehensive system‑stability measures such as data‑consistency checkpoints, throughput buffering, and 24/7 incident response—to boost efficiency and reliability during large‑scale promotions.

Big DataBlockchainOperations

0 likes · 7 min read

Improving JD Intelligent Supply Chain Efficiency and System Stability for Major Sales Events

Youzan Coder

Dec 14, 2018 · Operations

Youzan Full‑Link Load Testing Architecture and Implementation

Youzan’s full‑link load‑testing architecture combines a traffic generator, a data‑factory pipeline, and the Maxim platform to replay realistic e‑commerce user actions, tag and isolate test traffic via unified headers, route reads/writes to shadow storage, and integrate Gatling for capacity planning, degradation, alarm, disaster‑recovery and throttling drills.

Big DataData IsolationDistributed Systems

0 likes · 13 min read

Youzan Full‑Link Load Testing Architecture and Implementation

JD Retail Technology

Dec 12, 2018 · Big Data

Construction and Architecture of JD Overseas Data Analysis Platform (Columbus Platform)

JD.com’s overseas data analysis platform, dubbed the Columbus platform, combines a lightweight data warehouse deployment with standardized, customizable BI tools to provide real‑time and offline analytics, visualization, KPI management, and future self‑service reporting and predictive capabilities for its global e‑commerce operations.

AnalyticsBIBig Data

0 likes · 9 min read

Construction and Architecture of JD Overseas Data Analysis Platform (Columbus Platform)

ITFLY8 Architecture Home

Dec 12, 2018 · Backend Development

Solr vs Elasticsearch: Which Open‑Source Search Engine Fits Your Needs?

This article compares the two leading open‑source search engines, Solr and Elasticsearch, examining their architectures, features, deployment ease, scalability, community support, and ideal use cases to help you decide which solution best matches your application requirements.

Big DataDistributed SearchElasticsearch

0 likes · 10 min read

Solr vs Elasticsearch: Which Open‑Source Search Engine Fits Your Needs?

Manbang Technology Team

Dec 12, 2018 · Big Data

Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing

This article provides a comprehensive technical overview of Kafka, covering its core concepts, producer and consumer models, architecture, configuration parameters, replication mechanisms, performance optimizations, operational monitoring, tooling scripts, and related product implementations for real-time data processing.

ArchitectureBig DataKafka

0 likes · 18 min read

Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing

JD Tech

Dec 11, 2018 · Big Data

Introduction to Graph Computing and the JoyGraph System

This article introduces graph computing, compares it with graph databases, surveys notable graph processing systems, and details the architecture, NUMA‑aware design, execution model, push/pull dual mode, and load‑balancing strategies of the JoyGraph framework while outlining its future development directions.

Big DataJoyGraphNUMA

0 likes · 9 min read

Introduction to Graph Computing and the JoyGraph System

Python Crawling & Data Mining

Dec 5, 2018 · Big Data

Prepare Offline CDH 5.14 Installation Files on CentOS 6.7

This guide details the required system environment, download links, and offline file list for setting up Cloudera CDH 5.14 on a CentOS 6.7 server, including JDK, parcels, manager package, and MySQL connector, and explains how to upload them via Filezilla.

Big DataCDHCentOS

0 likes · 4 min read

Prepare Offline CDH 5.14 Installation Files on CentOS 6.7

NetEase Game Operations Platform

Dec 5, 2018 · Big Data

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

This article describes how NetEase Games built a Presto‑based interactive ad‑hoc query platform backed by Alluxio caching to achieve sub‑10‑second query latency, outlines the architectural design, performance comparisons with other Hadoop‑based solutions, encountered issues, and future improvement plans.

AlluxioBig DataPerformance

0 likes · 10 min read

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

AntTech

Dec 4, 2018 · Artificial Intelligence

Highlights from the 7th China Small‑and‑Medium Bank Development Summit on FinTech and Risk Management (Nov 29‑30 2018, Guangzhou)

The 7th China Small‑and‑Medium Bank Development Summit held in Guangzhou on November 29‑30 2018 gathered over 200 banking and fintech leaders to discuss the latest trends, challenges, and strategies in financial technology, digital transformation, risk control, and emerging technologies such as AI, big data, cloud and blockchain.

Artificial IntelligenceBig DataCloud Computing

0 likes · 14 min read

Highlights from the 7th China Small‑and‑Medium Bank Development Summit on FinTech and Risk Management (Nov 29‑30 2018, Guangzhou)

DataFunTalk

Dec 4, 2018 · Artificial Intelligence

Application and Exploration of Financial Knowledge Graphs

This article presents a comprehensive overview of financial knowledge graphs, covering their historical evolution, theoretical foundations, technical stack, implementation steps, and real‑world case studies in banking, regulatory technology, and securities, while highlighting community resources for AI and big‑data practitioners.

AIBig DataFinancial AI

0 likes · 14 min read

Application and Exploration of Financial Knowledge Graphs

dbaplus Community

Dec 2, 2018 · Big Data

Mastering Redis for Big Data: Architecture, Code Samples, and Performance Hacks

This article walks through the NewLife.Redis library’s two‑layer architecture, demonstrates basic and advanced usage with C# examples, shows pressure‑testing results, and shares practical tips such as GetAll/SetAll, pipelines, and serialization tricks for high‑throughput big‑data scenarios.

Big DataPerformancePipeline

0 likes · 13 min read

Mastering Redis for Big Data: Architecture, Code Samples, and Performance Hacks

Alibaba Cloud Developer

Nov 29, 2018 · Big Data

Why Apache Flink Became the Fastest‑Growing Big Data Engine in 2018

This article introduces Apache Flink’s rapid rise as the leading open‑source big data engine, explains its role in batch, stream, and interactive analytics, showcases real‑world use cases from Alibaba, Didi, and ByteDance, and outlines how Flink powers both big data and AI workloads.

AIApache FlinkBatch Processing

0 likes · 8 min read

Why Apache Flink Became the Fastest‑Growing Big Data Engine in 2018

JD Tech

Nov 28, 2018 · Operations

Technical Systems Behind JD Logistics for the 11.11 Global Shopping Festival

The article details how JD Logistics’ extensive warehouse, routing, distribution, and fulfillment systems—leveraging big data, AI, GIS, IoT, and distributed architectures—were engineered and optimized to handle the massive order surge during the 11.11 Global Shopping Festival with high throughput, low latency, and zero incidents.

AIBig DataGIS

0 likes · 8 min read

Technical Systems Behind JD Logistics for the 11.11 Global Shopping Festival

DataFunTalk

Nov 24, 2018 · Big Data

The Evolution of iQIYI's Big Data Analytics Platform

This article chronicles iQIYI’s journey from a simple Hive‑based data pipeline to the sophisticated, multi‑engine “Tongtian Tower” platform, detailing the development of the Magic Mirror system, the Gear workflow manager, BabelBD, the Monet visual analytics tool, and the integrated BI ecosystem that now supports billions of daily users.

BIBig Datadata engineering

0 likes · 18 min read

The Evolution of iQIYI's Big Data Analytics Platform

Tencent Cloud Developer

Nov 23, 2018 · Big Data

20 Free and Open-Source Data Visualization Tools

These 20 free and open‑source data visualization tools—from JavaScript libraries like D3.js and Chartist.js to user‑friendly platforms such as Datawrapper, Google Data Studio, and Tableau Public—enable businesses and analysts to transform raw data into interactive charts, maps, timelines, and dashboards, improving insight, decision‑making, and profitability.

Big DataData visualizationJavaScript libraries

0 likes · 12 min read

20 Free and Open-Source Data Visualization Tools

Alibaba Cloud Developer

Nov 22, 2018 · Big Data

How Alibaba’s Blink Testing Platform Guarantees Real‑Time Big Data Reliability

This article explains how Alibaba built a comprehensive Blink testing platform—including code‑quality checks, functional, performance, stability, and pre‑release testing—to ensure the reliability and scalability of its real‑time big‑data processing engine during massive workloads like Double 11.

Apache FlinkBig DataQuality assurance

0 likes · 13 min read

How Alibaba’s Blink Testing Platform Guarantees Real‑Time Big Data Reliability