Tagged articles
3675 articles
Page 32 of 37
21CTO
21CTO
Nov 20, 2018 · Big Data

What Languages and Tools Do Big Data Experts Use? Insights from 31 IT Leaders

Based on interviews with 31 IT leaders from 28 organizations, this article reveals the most popular programming languages, frameworks, and platforms—such as Python, Scala, Spark, Kafka, TensorFlow, and Tableau—currently driving big‑data extraction, analysis, and reporting, and highlights emerging trends and tool preferences.

Big DataKafkaPython
0 likes · 12 min read
What Languages and Tools Do Big Data Experts Use? Insights from 31 IT Leaders
Architects' Tech Alliance
Architects' Tech Alliance
Nov 19, 2018 · Cloud Computing

Suning’s Cloud‑Era Digital Transformation: Architecture Evolution, Technology Roadmap, and Organizational Change

The article details Suning’s Internet‑plus transformation, describing its strategic “one body, two wings, three clouds, four ends” model, the evolution of its enterprise architecture across three generations, the adoption of cloud, SOA, micro‑services, big‑data and AI platforms, and the accompanying R&D and organizational reforms.

Big DataCloud ComputingDigital Transformation
0 likes · 13 min read
Suning’s Cloud‑Era Digital Transformation: Architecture Evolution, Technology Roadmap, and Organizational Change
21CTO
21CTO
Nov 7, 2018 · Big Data

Why Data Streams Are the Backbone of Real-Time Big Data Analytics

Data streams, akin to endless rivers, enable continuous, real-time processing of diverse sources such as IoT telemetry, web logs, and e-commerce events, offering advantages over batch processing, while presenting challenges like scalability and fault tolerance, and are supported by tools like Kinesis, Kafka, Flink, and Storm.

Amazon KinesisApache KafkaBig Data
0 likes · 6 min read
Why Data Streams Are the Backbone of Real-Time Big Data Analytics
JD Retail Technology
JD Retail Technology
Nov 7, 2018 · Cloud Computing

Technical Preparations for Double 11 Sales Event at JD.com

JD.com's commercial team conducted extensive technical preparations for the Double 11 sales event, including system optimizations, stress testing, and data platform enhancements to handle massive traffic and ensure system stability.

Big DataDouble 11 PreparationTechnical Readiness
0 likes · 6 min read
Technical Preparations for Double 11 Sales Event at JD.com
Programmer DD
Programmer DD
Nov 7, 2018 · Big Data

Choosing the Right SQL Engine for Big Data: A Practical Guide

This article explores various SQL engines and storage options for big‑data workloads, compares their performance and capabilities, shows practical code examples, and offers guidance on writing efficient SQL in complex data environments.

Big DataSQL Enginesdata engineering
0 likes · 6 min read
Choosing the Right SQL Engine for Big Data: A Practical Guide
Xianyu Technology
Xianyu Technology
Nov 6, 2018 · Big Data

Technical Evolution of Xianyu Real-Time Selection System for Double Eleven

To meet Double‑Eleven’s sub‑second, billion‑item feed demands, Alibaba’s Xianyu selection system evolved from a Solr‑based search pipeline through offline batch and PostgreSQL attempts to a Blink‑powered real‑time stream platform using Niagara’s low‑latency LSM storage, delivering high‑throughput, personalized product feeds.

AlibabaBig DataFlink
0 likes · 23 min read
Technical Evolution of Xianyu Real-Time Selection System for Double Eleven
Architects' Tech Alliance
Architects' Tech Alliance
Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataData Lake
0 likes · 16 min read
Alluxio as a Virtual Distributed File System for Data Lake Solutions
dbaplus Community
dbaplus Community
Nov 1, 2018 · Big Data

How Vipshop Scales Real‑Time Data with Flink on Kubernetes

This article details Vipshop's real‑time platform architecture, the migration from Storm and Spark to Flink, Flink's deployment on Kubernetes, and the latest Unified Data Management system that unifies data access across Kafka, Redis, Tair and HDFS.

Big DataFlinkKubernetes
0 likes · 12 min read
How Vipshop Scales Real‑Time Data with Flink on Kubernetes
Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2018 · Big Data

Big Data Technology Trends and Cloud Data Warehouse Architecture Practices

The article reviews recent big-data trends—from Hadoop’s evolution and Spark’s in-memory advances to emerging storage like Ozone—while detailing data-warehouse models, query-optimizer techniques, and cloud-native architectures that integrate diverse data sources, enabling scalable, AI-ready analytics and modern data-lake capabilities.

Big DataData LakeHadoop
0 likes · 30 min read
Big Data Technology Trends and Cloud Data Warehouse Architecture Practices
Qunar Tech Salon
Qunar Tech Salon
Oct 25, 2018 · Big Data

Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions

This article explains how Alibaba adopted Apache Flink as a unified, low‑latency, high‑throughput big‑data engine, detailing its stream‑first design, state management, checkpointing, massive production deployment, community contributions, and upcoming plans for a unified API, SQL layer, broader language support, and AI integration.

AlibabaApache FlinkBig Data
0 likes · 13 min read
Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions
DataFunTalk
DataFunTalk
Oct 24, 2018 · Artificial Intelligence

The Technical Growth Path of an Algorithm Engineer in the Big Data Era

This article summarizes Zeng Xianglin’s presentation on the stages of an algorithm engineer’s career—from academic Beta research and feature engineering through online deployment, model training, and deep‑learning applications—highlighting practical challenges and best practices in large‑scale advertising systems.

Big Dataalgorithm engineeringonline advertising
0 likes · 13 min read
The Technical Growth Path of an Algorithm Engineer in the Big Data Era
Programmer DD
Programmer DD
Oct 21, 2018 · Big Data

How to Choose the Right Number of Kafka Partitions for Optimal Throughput

This article explains how to determine the optimal Kafka partition count by balancing throughput gains, key‑based ordering requirements, file descriptor limits, and availability impacts, offering practical guidelines such as testing hardware limits and using broker‑count multiples for scalable deployments.

Big DataPartitionsThroughput
0 likes · 8 min read
How to Choose the Right Number of Kafka Partitions for Optimal Throughput
21CTO
21CTO
Oct 19, 2018 · Big Data

How Meituan Scales Real‑Time Computing with Flink: Architecture, Challenges & Solutions

This article summarizes Meituan’s real‑time computing platform, detailing its layered architecture built on Kafka, Flink on YARN, state management, resource isolation, fault tolerance, monitoring, and the Petra metric aggregation system, while highlighting the challenges faced and the solutions implemented to achieve high‑throughput, low‑latency stream processing at massive scale.

Big DataFlinkReal-time Streaming
0 likes · 18 min read
How Meituan Scales Real‑Time Computing with Flink: Architecture, Challenges & Solutions
Tencent Cloud Developer
Tencent Cloud Developer
Oct 17, 2018 · Industry Insights

Why Graph Databases Are Redefining Enterprise Data Strategy

The article provides a detailed market and application analysis of graph databases, highlighting rapid growth, key use cases in finance and social networks, Tencent's StarGraph solution, advantages over relational databases, current limitations, and future industry adoption trends.

Big DataCloud ComputingGraph Database
0 likes · 6 min read
Why Graph Databases Are Redefining Enterprise Data Strategy
Xianyu Technology
Xianyu Technology
Oct 16, 2018 · Big Data

Millisecond-Level Counting for Billion-Scale Data via Offline Batch and Online Incremental Statistics

To achieve millisecond‑level counting on billion‑scale data, the Xianyu team replaced slow MySQL count queries with an offline batch that snapshots relational tables and computes totals, then uses KV‑store incremental statistics for online updates, delivering sub‑10 ms responses with near‑100 % success.

Big Datadatabaseincremental counting
0 likes · 7 min read
Millisecond-Level Counting for Billion-Scale Data via Offline Batch and Online Incremental Statistics
Java Backend Technology
Java Backend Technology
Oct 13, 2018 · Big Data

Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods

An interviewee faces the challenge of determining whether a newly given integer exists within a set of 4 billion numbers, and the article explores efficient solutions—from naive disk‑I/O approaches to distributed processing and the memory‑saving bitmap technique—highlighting their performance trade‑offs and implementation details.

Big DataBitmapalgorithm
0 likes · 6 min read
Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 10, 2018 · Artificial Intelligence

How Alibaba’s Uni‑Marketing Boosted Brand Conversions with AI‑Driven Audience Selection

This article details Alibaba's Uni‑Marketing case study where a brand‑targeted audience selection algorithm, built on big‑data and AI techniques, improved the O→IPL deepening rate by 47% during the New‑Year Festival, outlining the technical pipeline, models, evaluation metrics, challenges, and future directions.

Big DataDigital Marketingbrand optimization
0 likes · 20 min read
How Alibaba’s Uni‑Marketing Boosted Brand Conversions with AI‑Driven Audience Selection
Programmer DD
Programmer DD
Oct 6, 2018 · Big Data

Elastic Search IPO: What It Means for Search and Big Data

Elastic announced its IPO on the NYSE under ticker ESTC, highlighting its origins, rapid growth to over 5000 customers worldwide, a $160 million FY2018 revenue, and its Elastic Stack suite that powers search and analytics across industries, while investors celebrated the stock surge.

Big DataElasticsearchIPO
0 likes · 6 min read
Elastic Search IPO: What It Means for Search and Big Data
JD Tech
JD Tech
Sep 29, 2018 · Artificial Intelligence

JD.com Prediction Technology: Architecture, Applications, and Future Directions

The article outlines JD.com's evolution of prediction technology from early book‑category sales forecasting to a comprehensive AI‑driven platform that supports sales, order, and GMV forecasts, describes its modular architecture and core algorithm choices, and discusses future enhancements for smarter supply‑chain collaboration.

Big DataPredictionforecasting
0 likes · 6 min read
JD.com Prediction Technology: Architecture, Applications, and Future Directions
Architects' Tech Alliance
Architects' Tech Alliance
Sep 26, 2018 · Operations

How Goldeneye Enables Adaptive, Intelligent Business Monitoring at Scale

Goldeneye, Alibaba Mom's monitoring platform, uses big‑data pipelines, dynamic threshold prediction, mean‑shift change‑point detection, and automated metric discovery to replace manual alarm settings, reduce false alerts, and provide intelligent, scalable business monitoring across hundreds of services.

Big DataOperationsbusiness monitoring
0 likes · 19 min read
How Goldeneye Enables Adaptive, Intelligent Business Monitoring at Scale
HomeTech
HomeTech
Sep 25, 2018 · Operations

Design and Implementation of an Integrated Log Collection, Analysis, and Monitoring System

This article describes how a rapidly growing technical team built a unified log system that consolidates program, web access, and slow logs, introduces host‑agent and process‑agent collection, leverages Kafka, Elasticsearch, and Storm for high‑throughput processing, and provides monitoring, alerting, and reporting features to improve reliability and operational efficiency.

Big DataElasticsearchLog Management
0 likes · 20 min read
Design and Implementation of an Integrated Log Collection, Analysis, and Monitoring System
Tencent Cloud Developer
Tencent Cloud Developer
Sep 20, 2018 · Industry Insights

How Big Data Drives Intelligent Outbound Calls and AI Customer Service

This article explains how a data‑driven platform combines big‑data preprocessing, behavior‑prediction models, and AI‑powered voice and text services to improve pre‑sale lead scoring, targeted SMS campaigns, and post‑sale customer support, using Tencent Cloud's TI One platform as a case study.

AI Customer ServiceBig DataIndustry Insights
0 likes · 17 min read
How Big Data Drives Intelligent Outbound Calls and AI Customer Service
Tencent Cloud Developer
Tencent Cloud Developer
Sep 20, 2018 · Artificial Intelligence

What Everyone Should Know About Machine Learning

Machine learning lets computers learn patterns from examples instead of explicit code, enabling tasks like image and fraud detection, predictive maintenance, and personalized services, now feasible thanks to big data, cloud compute, and open-source tools, and increasingly discussed by executives for strategic automation.

Big DataNeural NetworksPredictive Maintenance
0 likes · 11 min read
What Everyone Should Know About Machine Learning
Big Data and Microservices
Big Data and Microservices
Sep 17, 2018 · Big Data

5 Essential Data Mining Techniques Every Analyst Should Know

This article outlines five widely used data‑mining methods—association rules, classification/tagging, clustering, decision trees, and sequential pattern mining—explaining their principles, real‑world examples, and how they help organizations extract actionable insights from massive datasets.

Big DataDecision TreesSequential Pattern Mining
0 likes · 6 min read
5 Essential Data Mining Techniques Every Analyst Should Know
Qunar Tech Salon
Qunar Tech Salon
Sep 14, 2018 · Big Data

AIGOV Five‑Star Model for Data Asset Management: Framework, Capabilities, and Enterprise Practices

The article presents the AIGOV five‑star data asset management model, analyzes its five management domains and thirteen capability items, compares it with domestic and international frameworks, and illustrates its practical value through detailed enterprise case studies and references to maturity models.

Big DataData Asset ManagementMaturity Model
0 likes · 19 min read
AIGOV Five‑Star Model for Data Asset Management: Framework, Capabilities, and Enterprise Practices
Programmer DD
Programmer DD
Sep 13, 2018 · Big Data

How Deleting a Kafka Topic Removes Consumer Offsets and Why It Matters

This article examines a real‑world Kafka scenario where a topic is created, messages are produced and consumed, the topic is deleted, and then recreated, revealing that deleting the topic also removes its consumer offset metadata from the __consumer_offsets internal topic, causing new consumers to rely on their auto.offset.reset configuration.

Big DataConsumer OffsetsGroupCoordinator
0 likes · 6 min read
How Deleting a Kafka Topic Removes Consumer Offsets and Why It Matters
JD Tech
JD Tech
Sep 7, 2018 · Information Security

Big Data and AI Security Insights from ISC 2018 Conference

The ISC 2018 conference highlighted the growing importance of big data and artificial intelligence security, presenting JD's research on anti‑scraping techniques, AI‑driven defenses against black‑market attacks, and a service‑oriented approach to protecting user data across enterprises.

AI securityBig DataInformation Security
0 likes · 5 min read
Big Data and AI Security Insights from ISC 2018 Conference
Tencent Cloud Developer
Tencent Cloud Developer
Sep 6, 2018 · Big Data

Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions

As mobile and IoT data surge, real-time stream computing—especially Flink’s low-latency, high-throughput, exactly-once engine—addresses challenges of latency, accuracy, and usability, and Tencent Cloud’s managed Flink service provides elastic, secure, integrated pipelines for applications ranging from online status monitoring to fraud detection and smart transportation.

Apache StormBig DataFlink
0 likes · 30 min read
Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions
Big Data and Microservices
Big Data and Microservices
Sep 4, 2018 · Big Data

Exploring Five Big Data Architectures—from Traditional to Unified AI Designs

The article examines the evolution of big‑data processing by comparing five prevalent architectures—traditional Hadoop‑based stacks, streaming‑only designs, Kappa, Lambda, and the unified Unifield model—highlighting their strengths, weaknesses, and suitable scenarios while discussing the limitations of classic BI systems and the role of distributed storage, computation, and machine‑learning integration.

Big DataData ArchitectureHadoop
0 likes · 14 min read
Exploring Five Big Data Architectures—from Traditional to Unified AI Designs
Big Data and Microservices
Big Data and Microservices
Aug 28, 2018 · Big Data

Turning Idle Hadoop Clusters into Valuable Data-Driven Products and Processes

The article examines how enterprises can transform big data from idle Hadoop clusters into valuable assets by adopting data-driven processes and products, outlining the distinction between technology-driven and business-driven approaches, describing data and service product models, and highlighting process optimization across various business functions.

Big DataEnterprise Analyticsdata-driven processes
0 likes · 7 min read
Turning Idle Hadoop Clusters into Valuable Data-Driven Products and Processes
Big Data and Microservices
Big Data and Microservices
Aug 26, 2018 · Big Data

Why Data, Not Process, Is the New Core of Business: 10 Big‑Data Principles Explained

The article outlines ten core big‑data principles—shifting from process‑centric to data‑centric thinking, emphasizing data value, efficiency, relevance, full‑sample analysis, prediction, information‑finding, machine understanding, e‑commerce intelligence, and mass customization—illustrated with real‑world examples and their impact on modern industry.

Big DataIndustry Insightscorrelation
0 likes · 26 min read
Why Data, Not Process, Is the New Core of Business: 10 Big‑Data Principles Explained
DataFunTalk
DataFunTalk
Aug 21, 2018 · Artificial Intelligence

iQIYI Traffic Anti-Cheat: Techniques, System Architecture, and Future Directions

This article provides a comprehensive overview of iQIYI's traffic anti‑cheat mechanisms, covering definitions of fraudulent traffic, industry challenges, data cleaning relationships, system design, rule‑based and machine‑learning solutions, feature engineering, model evaluation, monitoring, service applications, and future prospects.

Big DataSystem ArchitectureTraffic analysis
0 likes · 11 min read
iQIYI Traffic Anti-Cheat: Techniques, System Architecture, and Future Directions
Meitu Technology
Meitu Technology
Aug 17, 2018 · Big Data

Meitu Distributed Bitmap System (Naix): Architecture, Implementation, and Performance Evaluation

Meitu’s Naix distributed bitmap system accelerates massive user‑data analytics by using a three‑layer architecture, sharded RoaringBitmap storage, and PalDB, delivering over 600× faster queries than Hive, supporting fast generation plugins, fault‑tolerant replication, and millisecond‑level RPC query responses while reducing storage by 67%.

Big DataBitmapNaix
0 likes · 16 min read
Meitu Distributed Bitmap System (Naix): Architecture, Implementation, and Performance Evaluation
Big Data and Microservices
Big Data and Microservices
Aug 16, 2018 · Big Data

Mastering Big Data Analysis: 5 Core Aspects and 4 Key Methods

This article outlines the five fundamental aspects of big data analysis—visualization, data‑mining algorithms, predictive analytics, semantic engines, and data quality management—and explains four primary analytical approaches: descriptive, diagnostic, predictive, and prescriptive analysis.

Big Datadata analysisdata mining
0 likes · 6 min read
Mastering Big Data Analysis: 5 Core Aspects and 4 Key Methods
Meitu Technology
Meitu Technology
Aug 14, 2018 · Big Data

Meitu Data Platform Architecture and Practices

Meitu’s data platform, serving dozens of apps with 500 million monthly active users and billions of daily events, combines the Arachnia log‑collection system, Kafka ingestion, multi‑layer storage (HDFS, MongoDB, HBase, Elasticsearch), offline Hive/MapReduce processing and real‑time Storm/Flink/Naix pipelines, supported by data‑workshop tools, staged evolution for scalability, and robust security and query‑validation mechanisms.

Big DataData PlatformETL
0 likes · 16 min read
Meitu Data Platform Architecture and Practices
Big Data and Microservices
Big Data and Microservices
Aug 13, 2018 · Big Data

8 Essential Principles for Effective Enterprise Big Data Implementation

The article outlines eight key principles that enterprises should follow to harness big data responsibly, covering goal definition, strategic partnership, source identification, continuous communication, agile iteration, technology evaluation, cloud alignment, and talent development with security considerations.

Big DataData GovernanceEnterprise
0 likes · 10 min read
8 Essential Principles for Effective Enterprise Big Data Implementation
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 13, 2018 · Big Data

How Ele.me Evolved Its Real‑Time Engine: From Storm to Flink

This article examines Ele.me’s big‑data platform evolution, comparing Storm, Spark Streaming, Structured Streaming, and Flink, detailing their architectures, consistency semantics, performance trade‑offs, and why Flink became the preferred real‑time computation engine for the company.

Big DataFlinkSpark
0 likes · 15 min read
How Ele.me Evolved Its Real‑Time Engine: From Storm to Flink
Meitu Technology
Meitu Technology
Aug 11, 2018 · Big Data

Meitu Technology Salon: Evolution of the Big Data Platform, Distributed Bitmap (Naix), and Apache Kylin

At Meitu’s Technology Salon, senior big‑data experts detailed the end‑to‑end architecture and stability measures of Meitu’s large‑scale data platform, introduced the high‑performance distributed bitmap solution Naix, showcased the evolution of Meizu’s user‑insight system, and highlighted Apache Kylin’s OLAP capabilities and Superset integration for scalable, real‑time analytics.

Apache KylinBig DataData Analytics
0 likes · 9 min read
Meitu Technology Salon: Evolution of the Big Data Platform, Distributed Bitmap (Naix), and Apache Kylin
Big Data and Microservices
Big Data and Microservices
Aug 10, 2018 · Big Data

5 Ways Big Data Empowers Modern Enterprises

Big data has become a critical asset for companies, enabling them to understand users, precisely locate resources, enhance marketing and operations, deliver refined services, and anticipate crises, thereby turning raw information into strategic advantage across multiple business functions.

Big DataEnterprise AnalyticsResource Optimization
0 likes · 7 min read
5 Ways Big Data Empowers Modern Enterprises
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 10, 2018 · Big Data

Data-Driven Entertainment: iQIYI’s Big Data Platform and AI Applications

iQIYI’s unified “Tongtian Tower” big‑data platform integrates analytics, AI and open APIs to turn viewer behavior and public sentiment into market insights, personalized recommendations, smart casting and churn‑prediction tools, embedding a data‑driven culture that fuels its rapid subscriber growth and revenue surge.

AIBig DataData Platform
0 likes · 12 min read
Data-Driven Entertainment: iQIYI’s Big Data Platform and AI Applications
Architects' Tech Alliance
Architects' Tech Alliance
Aug 8, 2018 · Big Data

High‑Performance Data Analytics (HPDA): Architecture, Market Trends, and Fujitsu Reference Model

The article provides a comprehensive overview of High‑Performance Data Analytics (HPDA), detailing its market drivers, technical classifications, integration of HPC with big‑data workloads, Fujitsu's reference architecture, hardware configurations, benchmark results, and the economic benefits of deploying HPDA on existing HPC infrastructures.

Big DataFujitsuHPC
0 likes · 14 min read
High‑Performance Data Analytics (HPDA): Architecture, Market Trends, and Fujitsu Reference Model
Architecture Digest
Architecture Digest
Aug 7, 2018 · Big Data

Apache Kafka Overview, Architecture, and Sample Producer/Consumer Code

This article provides a comprehensive overview of Apache Kafka, comparing it with ActiveMQ, explaining its distributed architecture, topics, partitions, consumption models, high‑availability mechanisms, exactly‑once semantics, and includes detailed Java producer and consumer code examples for practical implementation.

Big DataConsumerDistributed Messaging
0 likes · 22 min read
Apache Kafka Overview, Architecture, and Sample Producer/Consumer Code
Youzan Coder
Youzan Coder
Aug 3, 2018 · Big Data

Youzan Data Warehouse Metadata System: From Manual Tables to Metadata‑Driven Architecture

Youzan’s data‑warehouse metadata system evolved from manually maintained tables to an automated data dictionary and finally to a metadata‑driven architecture that automatically captures technical, business, and process metadata, visualizes lineage, tracks resource usage, manages synchronization rules and permissions, and now aims to improve novice usability with visual models and impact‑analysis tools.

Big DataLineageResource Monitoring
0 likes · 11 min read
Youzan Data Warehouse Metadata System: From Manual Tables to Metadata‑Driven Architecture
Meituan Technology Team
Meituan Technology Team
Aug 2, 2018 · Big Data

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

Meituan’s in‑store dining team demonstrates how R’s open‑source packages, powerful data manipulation, rich visualization libraries, and reproducible reporting can be engineered into scalable, parallelized workflows that turn secondary data processing into fast, interactive dashboards and analytics, proving R’s enterprise‑grade performance and adoption.

Big DataData visualizationR
0 likes · 18 min read
R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan
MaGe Linux Operations
MaGe Linux Operations
Aug 2, 2018 · Big Data

Unlocking PUBG Victory: Data‑Driven Insights on Drop Zones, Final Circles, Weapons, and Kill Strategies

This article analyzes 18 million PUBG match records using Python to reveal optimal drop locations, high‑probability final‑circle spots, preferred weapons, and the relationship between kill distance, kill count, and winning chances, providing data‑driven strategies for players seeking more chicken dinners.

Big DataGame AnalyticsPUBG
0 likes · 13 min read
Unlocking PUBG Victory: Data‑Driven Insights on Drop Zones, Final Circles, Weapons, and Kill Strategies
Efficient Ops
Efficient Ops
Aug 1, 2018 · Operations

How Tencent Revolutionized Monitoring: From IDC Crises to AI‑Driven AIOps

This talk by Tencent’s monitoring R&D lead outlines a decade of evolution in large‑scale monitoring, covering real‑world incident cases, the three drivers behind architectural upgrades, the implementation of a three‑dimensional monitoring framework, and the application of AI‑powered AIOps for precise, rapid anomaly detection.

Big DataCloud ComputingOperations
0 likes · 18 min read
How Tencent Revolutionized Monitoring: From IDC Crises to AI‑Driven AIOps
Big Data and Microservices
Big Data and Microservices
Jul 29, 2018 · Industry Insights

Top 5 Big Data & AI Trends Shaping 2018 and Beyond

According to recent Forrester and Forbes reports, 2018 will see AI overtaking big-data hype, driving five key trends—from heightened cybersecurity in healthcare to expanded IoT, plug-and-play AI solutions, the rise of chief digital officers, and smarter community policing—each reshaping how organizations leverage data.

AI trendsBig DataIndustry Analysis
0 likes · 8 min read
Top 5 Big Data & AI Trends Shaping 2018 and Beyond
Xianyu Technology
Xianyu Technology
Jul 28, 2018 · Big Data

Real-Time Computation Architecture for Non-Timeline Feed Ranking

The paper presents a real‑time computation architecture on Alibaba Cloud Blink that scores and ranks non‑timeline feed items within a sliding 72‑hour window, updating rankings every few minutes, using Redis ZSET for fast retrieval, and discusses scaling optimizations such as interval tuning and external join‑and‑rank services.

Big DataReal‑Time Computingfeed ranking
0 likes · 6 min read
Real-Time Computation Architecture for Non-Timeline Feed Ranking
Architects Research Society
Architects Research Society
Jul 27, 2018 · Big Data

Overview of Apache Hive Features, Usage, and Management

Apache Hive is an open‑source data‑warehouse system built on Hadoop that enables users to read, write, and manage large distributed datasets using SQL‑like queries, offering features such as ETL support, various file‑format connectors, extensible UDFs, and integration with tools like Tez, Spark, and MapReduce.

Apache HiveBig DataETL
0 likes · 5 min read
Overview of Apache Hive Features, Usage, and Management
58 Tech
58 Tech
Jul 27, 2018 · Big Data

Sun Dial: 58.com’s General‑Purpose AB Testing Platform – Architecture, Features, and Real‑Time Data Processing

The Sun Dial platform is a universal A/B testing system built for 58.com that supports single‑layer and multi‑layer experiments, provides uniform traffic splitting, real‑time OLAP analytics with Druid, and offers a web interface for easy configuration, enabling data‑driven product optimization across multiple business lines.

A/B testingBig DataDruid
0 likes · 14 min read
Sun Dial: 58.com’s General‑Purpose AB Testing Platform – Architecture, Features, and Real‑Time Data Processing
Big Data and Microservices
Big Data and Microservices
Jul 26, 2018 · Industry Insights

How Big Data is Transforming the Financial Industry: Applications and Challenges

This article examines how big data technologies are reshaping banking, insurance, and securities by enabling customer profiling, precision marketing, risk management, and operational optimization, while also outlining the key challenges such as data quality, integration complexity, standards, and governance that the sector must overcome.

BankingBig DataData Analytics
0 likes · 19 min read
How Big Data is Transforming the Financial Industry: Applications and Challenges
Meituan Technology Team
Meituan Technology Team
Jul 26, 2018 · Backend Development

Evolution of Meituan Delivery System Architecture and Practices

Meituan Delivery’s architecture has progressed from a rapid MVP with coarse services to a scalable, fine‑grained platform comprising fulfillment, operation, and master‑data subsystems, employing reliability engineering, capacity planning, AI‑driven simulation, and location services to ensure high availability, efficiency, and future‑ready scalability.

AIBig DataMicroservices
0 likes · 16 min read
Evolution of Meituan Delivery System Architecture and Practices
JD Tech
JD Tech
Jul 24, 2018 · Databases

Understanding Graph Databases: Concepts, History, Use Cases, and Comparative Overview

This article explains what graph databases are, traces their evolution from early navigational models to modern distributed systems, highlights their core concepts and advantages over relational databases, showcases typical application scenarios, and provides a comparative overview of popular open‑source graph database engines to guide technology selection.

Big DataGraph DatabaseNoSQL
0 likes · 8 min read
Understanding Graph Databases: Concepts, History, Use Cases, and Comparative Overview
ITPUB
ITPUB
Jul 23, 2018 · Big Data

What China's Vaccine Procurement Data Reveals: A Province‑Level Analysis

This article documents the collection, cleaning, and statistical analysis of publicly released second‑category vaccine procurement data from 28 Chinese provinces, highlighting data sources, processing steps with pandas, top manufacturers, regional market shares, and the challenges encountered during the effort.

Big DataChinadata analysis
0 likes · 9 min read
What China's Vaccine Procurement Data Reveals: A Province‑Level Analysis
Tencent Cloud Developer
Tencent Cloud Developer
Jul 23, 2018 · Big Data

Analysis of Chinese Second-Class Vaccine Procurement Data

The study aggregates and cleans 2017‑2020 Chinese second‑class vaccine procurement data from 28 provinces into a 1,529‑record CSV, revealing a right‑skewed distribution where a handful of manufacturers—led by Beijing Kexing and Changchun Changsheng—account for the majority of entries, while noting gaps in several regions and encouraging further collaborative refinement.

Big DataChinese healthcaredata analysis
0 likes · 10 min read
Analysis of Chinese Second-Class Vaccine Procurement Data
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 23, 2018 · Big Data

How Alibaba’s MaxCompute Became the Backbone of 99% Data Processing

This article reviews Alibaba's MaxCompute evolution from ODPS to a unified, multi‑cluster big‑data platform, detailing its architecture, development tools, large‑scale deployments, performance optimizations, typical workload scenarios, and why it is the preferred choice for enterprise data processing.

Alibaba CloudBig DataData Platform
0 likes · 22 min read
How Alibaba’s MaxCompute Became the Backbone of 99% Data Processing
Youzan Coder
Youzan Coder
Jul 20, 2018 · Big Data

How Youzan Built a Scalable Big Data Development Platform (DP)

This article details the design, architecture, and operational experience of Youzan's Data Platform (DP), covering its scheduling, data‑sync, service, and monitoring modules, the custom Airflow‑based task scheduler, current production metrics, supported task types, and future improvement plans.

AirflowBig DataData Platform
0 likes · 12 min read
How Youzan Built a Scalable Big Data Development Platform (DP)
Didi Tech
Didi Tech
Jul 17, 2018 · Artificial Intelligence

Didi Showcases AI‑Driven Intelligent Transportation Research at ACM SIGIR 2018

At ACM SIGIR 2018, Didi presented AI‑driven intelligent‑transportation research—including a ride‑sharing preference prediction paper, keynote insights on smart dispatch, maps and traffic, collaborations with over twenty cities and numerous universities, open data initiatives, and plans for new thematic research programs.

Artificial IntelligenceBig DataIndustry-Academia Collaboration
0 likes · 9 min read
Didi Showcases AI‑Driven Intelligent Transportation Research at ACM SIGIR 2018
360 Tech Engineering
360 Tech Engineering
Jul 13, 2018 · Big Data

Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice

The article describes the evolution of 360's Titan big‑data processing platform through three architectural stages, details its functional modules, explains the DITTO component framework, context and rule‑engine abstractions, and shares practical case studies and personal insights on building a flexible, self‑service data platform.

Big DataDITTOETL
0 likes · 12 min read
Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice
High Availability Architecture
High Availability Architecture
Jul 12, 2018 · Information Security

Evolution of Zhihu’s Anti‑Cheat System “Wukong”: Architecture, Strategies, and Lessons Learned

This article chronicles the three‑generation evolution of Zhihu’s anti‑cheat platform Wukong, detailing its business context, spam taxonomy, multi‑layered control methods, architectural redesigns, strategy language improvements, graph‑based risk analysis, and the continuous integration of big‑data and machine‑learning techniques to combat content and behavior spam.

Big DataInformation SecurityRisk management
0 likes · 23 min read
Evolution of Zhihu’s Anti‑Cheat System “Wukong”: Architecture, Strategies, and Lessons Learned
Ctrip Technology
Ctrip Technology
Jul 3, 2018 · Big Data

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

This article details Ctrip's experience with the Presto distributed SQL engine, outlining the initial performance and stability issues, the comprehensive enhancements made in security, resource control, compatibility, and monitoring, and the multi‑stage upgrade plan that guides its future evolution.

Big DataKerberosPerformance Optimization
0 likes · 11 min read
Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap
AntTech
AntTech
Jul 3, 2018 · Backend Development

Evolution of Financial‑Grade Message Queues at Ant Financial

The article reviews the ten‑year evolution of Ant Financial's message queue, detailing its core reliability, consistency, availability and performance requirements, the architectural mechanisms built to meet them, the shift to pull‑mode and API‑mode designs, and the recent integration of compute capabilities to create a smart data transmission platform.

Big DataDistributed SystemsMessage Queue
0 likes · 13 min read
Evolution of Financial‑Grade Message Queues at Ant Financial
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 2, 2018 · Artificial Intelligence

How JD.com Built a Multi‑Screen Personalized Recommendation Engine

This article explains how JD.com evolved its recommendation system from simple product suggestions to a sophisticated, multi‑screen, multi‑type personalized engine using big‑data collection, real‑time behavior tracking, machine‑learning models, and a modular architecture that boosts conversion and user experience.

Big Datae‑commercemachine learning
0 likes · 14 min read
How JD.com Built a Multi‑Screen Personalized Recommendation Engine
Baidu Intelligent Testing
Baidu Intelligent Testing
Jun 29, 2018 · Product Management

Baidu Product Evaluation Framework and Common Assessment Methods

This article outlines Baidu's comprehensive product evaluation framework, describing its multi‑layer assessment system, the combination of subjective and objective metrics, and a suite of common evaluation methods such as indicator analysis, AB testing, user feedback, behavior analysis, big‑data profiling, and competitor comparison.

AB testingBig DataMetrics
0 likes · 16 min read
Baidu Product Evaluation Framework and Common Assessment Methods
58 Tech
58 Tech
Jun 27, 2018 · Big Data

Overview of the 58 User Profile System Architecture and Data Processing

The article describes the design, data integration, ID mapping, tag generation, and application scenarios of the 58 user profiling platform, which aggregates billions of user IDs across multiple business lines to provide online and offline persona data for personalization, analytics, and AI modeling.

Big DataData ArchitectureData Integration
0 likes · 12 min read
Overview of the 58 User Profile System Architecture and Data Processing
DataFunTalk
DataFunTalk
Jun 24, 2018 · Big Data

OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring

This article summarizes OPPO's rapid growth of its big‑data platform, detailing the three‑layer architecture, the evolution from Flume‑Kafka to NiFi for data ingestion, the upgrade of the OFlow task scheduler, comprehensive monitoring of data, resources and task SLA, and the development of a self‑service analytics tool called InnerEye to ensure stability, efficiency, and security.

AirflowBig DataNiFi
0 likes · 10 min read
OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring
Architecture Digest
Architecture Digest
Jun 18, 2018 · Operations

Design and Optimization of Large‑Scale Log Systems

This article examines the challenges of handling massive log data in high‑traffic e‑commerce platforms and presents a comprehensive architecture, optimization strategies, and practical implementations—including Rsyslog, Kafka, Fluentd, and the ELK stack—to improve scalability, performance, and reliability of log management systems.

Big DataELKFluentd
0 likes · 17 min read
Design and Optimization of Large‑Scale Log Systems
Didi Tech
Didi Tech
Jun 16, 2018 · Artificial Intelligence

AI and Big Data in Didi’s Mapping Services – Insights from WGDC 2018

At WGDC 2018, Didi’s mapping division revealed how its AI‑driven platform leverages massive real‑time travel data, machine‑learning and deep‑learning models—including a new ETA estimator, demand‑supply forecasting, and reinforcement‑learning order allocation—to deliver ultra‑accurate pick‑up points, route planning, and destination predictions, while opening de‑identified data and research topics to academia.

AIBig DataETA
0 likes · 6 min read
AI and Big Data in Didi’s Mapping Services – Insights from WGDC 2018
Tencent Cloud Developer
Tencent Cloud Developer
Jun 11, 2018 · Cloud Computing

Tencent Cloud's Government Cloud Strategy and Digital Guangdong Practice

Tencent Cloud’s government‑cloud strategy, showcased by Guangdong’s “粤省事” platform, leverages WeChat as a single access point and a partner‑driven backend of AI, big‑data and IoT services to digitize certificates, streamline workflows for citizens, businesses and officials, and address low public‑service satisfaction by redesigning processes rather than merely automating them.

AIBig DataDigital Transformation
0 likes · 12 min read
Tencent Cloud's Government Cloud Strategy and Digital Guangdong Practice
Efficient Ops
Efficient Ops
Jun 6, 2018 · Big Data

How Tencent’s Multi‑Dimensional Monitoring Turns Big Data Into Real‑Time Business Insights

This article explains how Tencent’s ZhiYun multi‑dimensional monitoring system evolves from the Mobile Monitor platform, outlines its design principles, data‑factory capabilities, storage choices, and intelligent features, and demonstrates how it enables real‑time, multi‑dimensional analysis and alerting for large‑scale business operations.

Big DataDruidStorm
0 likes · 11 min read
How Tencent’s Multi‑Dimensional Monitoring Turns Big Data Into Real‑Time Business Insights
ITPUB
ITPUB
Jun 4, 2018 · Big Data

Is Hadoop Really Declining? Expert Insights Show Why the Ecosystem Stays Strong

Despite Gartner's 2017 claim that Hadoop is nearing the end of its production maturity, a series of interviews with Chinese big‑data experts reveal that Hadoop's ecosystem remains robust, with core components like HDFS, YARN, Spark, and HBase continuing to dominate the market.

Big DataEcosystemGartner
0 likes · 9 min read
Is Hadoop Really Declining? Expert Insights Show Why the Ecosystem Stays Strong
ITPUB
ITPUB
Jun 3, 2018 · Big Data

Spark vs Hadoop: Which Distributed System Fits Your Data Needs?

An in‑depth comparison of Hadoop and Spark examines their architectures, performance, cost, security, and machine‑learning capabilities, helping readers decide which open‑source distributed processing platform best matches their batch, streaming, and analytical workloads.

Big DataCostHadoop
0 likes · 13 min read
Spark vs Hadoop: Which Distributed System Fits Your Data Needs?
ITPUB
ITPUB
Jun 2, 2018 · Big Data

Mastering Spark: Core Concepts, Architecture, Streaming & Performance Tuning

This comprehensive guide explains Spark's ecosystem, execution principles, key features, deployment architectures, core concepts like RDD, Transformations, Actions, Jobs, Stages, Shuffle and Cache, as well as Spark Streaming mechanics and practical resource‑tuning tips for optimal big‑data processing.

Big DataClusterPerformance Tuning
0 likes · 15 min read
Mastering Spark: Core Concepts, Architecture, Streaming & Performance Tuning
Tencent Cloud Developer
Tencent Cloud Developer
Jun 1, 2018 · Backend Development

Building Tencent Xinge: Architecture and Practices for Massive Mobile Push Service

The talk details Tencent Xinge’s architecture and cloud‑native practices that enable hundred‑billion‑level mobile push, combining terminal integration, real‑time backend filtering, distributed bitmap selection, precise‑push AI models, and DevOps pipelines to deliver fast, scalable, data‑driven notifications with effect tracking.

Backend ArchitectureBig DataDistributed Systems
0 likes · 18 min read
Building Tencent Xinge: Architecture and Practices for Massive Mobile Push Service
ITPUB
ITPUB
May 31, 2018 · Big Data

Mastering Spark on DataMagic: Fast‑Track Your Big Data Skills

This article explains Spark's role in the DataMagic platform, outlines four practical steps to quickly master Spark, details key configuration and parallelism settings, shows how to modify Spark code, and provides operational tips for cluster management and job troubleshooting.

Big DataCluster ManagementDataMagic
0 likes · 10 min read
Mastering Spark on DataMagic: Fast‑Track Your Big Data Skills
dbaplus Community
dbaplus Community
May 30, 2018 · Big Data

Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Strategies

This article explains Spark's executor memory architecture, covering on‑heap and off‑heap allocation, static versus unified memory managers, storage and execution memory handling, RDD persistence levels, eviction policies, and shuffle memory usage, providing practical formulas and configuration tips for optimal performance.

Big DataExecutorMemory Management
0 likes · 23 min read
Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Strategies
Architecture Digest
Architecture Digest
May 27, 2018 · Big Data

Installing Elasticsearch and Performing Data Aggregation Queries

This article walks through installing Elasticsearch 5.6.9, configuring system limits, creating indices, inserting and deleting documents, executing complex aggregation queries, and integrating Elasticsearch with Java using the TransportClient, providing a practical guide for building analytics on large‑scale data.

AnalyticsBig DataElasticsearch
0 likes · 12 min read
Installing Elasticsearch and Performing Data Aggregation Queries