Tagged articles

3675 articles

Page 33 of 37

May 22, 2018 · Industry Insights

How AI and Cloud Are Redefining Knowledge Sharing – Insights from Zhihu’s VP

In an interview, Zhihu senior vice‑president Li Dahai explains how AI, big data and Tencent Cloud enable Zhihu’s knowledge‑matching algorithms, boost answer volume by 300%, and shape the platform’s vision of connecting expert users while emphasizing principles over pure algorithmic tricks.

AIBig DataTencent Cloud

0 likes · 6 min read

How AI and Cloud Are Redefining Knowledge Sharing – Insights from Zhihu’s VP

DataFunTalk

May 22, 2018 · Information Security

Designing a Credit-Based Content Management System: Strategies, Risk Assessment, and AI Techniques

The article outlines how to build a credit‑based content management platform by describing the evolution of security practices, defining user‑generated, professional‑generated, and occupational content models, proposing a credit‑audit workflow with risk assessment, and presenting AI‑driven text classification and anti‑cheat methods to balance traffic, quality, and trust.

Artificial IntelligenceBig DataInformation Security

0 likes · 12 min read

Designing a Credit-Based Content Management System: Strategies, Risk Assessment, and AI Techniques

ITPUB

May 16, 2018 · Databases

How a Chinese Startup Cracked the 12‑Year TPC‑DS Benchmark: Inside the Database Performance Breakthrough

A Chinese company, StarRing Technology, became the first ever to pass the notoriously difficult TPC‑DS benchmark, revealing the test’s history, methodology, and why this achievement marks a major milestone for database performance in the big‑data era.

BenchmarkBig DataPerformance Testing

0 likes · 7 min read

How a Chinese Startup Cracked the 12‑Year TPC‑DS Benchmark: Inside the Database Performance Breakthrough

Architects' Tech Alliance

May 14, 2018 · Big Data

Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization

This article explains Hadoop's distributed storage and processing framework, details the MapReduce programming model, describes the classic JobTracker/TaskTracker architecture, outlines the shuffle and combine phases, and introduces YARN as a scalable replacement with its ResourceManager, ApplicationMaster, and NodeManager components.

Big DataHadoopMapReduce

0 likes · 13 min read

Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization

Alibaba Cloud Developer

May 3, 2018 · Artificial Intelligence

How Alibaba’s City Brain Uses AI to Transform Urban Management

In a recent Cloud Xi conference, Hua Xiansheng, deputy director of Alibaba’s DAMO Academy Machine Intelligence Lab, presented the City Brain initiative, unveiling three AI-powered products—Tianyao, Tianying, and Tianji—that leverage massive video and sensor data to achieve real‑time perception, decision‑making, prediction, and intervention for smarter urban governance.

AIBig DataSmart City

0 likes · 10 min read

How Alibaba’s City Brain Uses AI to Transform Urban Management

Qunar Tech Salon

May 3, 2018 · Big Data

Understanding Kafka Message Formats Across Versions 0.7.x, 0.8.x, and 0.10.x

This article explains the evolution of Kafka message formats from version 0.7.x through 0.8.x (including 0.9.x) to 0.10.x, detailing each field, compression handling, and the design motivations behind the changes.

Big DataKafkaMessage Format

0 likes · 9 min read

Understanding Kafka Message Formats Across Versions 0.7.x, 0.8.x, and 0.10.x

Suning Technology

Apr 28, 2018 · Artificial Intelligence

How AI, Cloud, and Big Data Power Smart Retail: Insights from Suning’s Digital Strategy

Suning’s executive vice president outlines how digital empowerment—through AI, cloud computing, and big‑data analytics—creates a new smart‑retail ecosystem, emphasizing user‑centric services, data‑driven operations, and technology platforms that blend efficiency with a human touch.

AIBig DataCloud Computing

0 likes · 9 min read

How AI, Cloud, and Big Data Power Smart Retail: Insights from Suning’s Digital Strategy

Beike Product & Technology

Apr 26, 2018 · Big Data

Chain Home's OLAP Platform and Kylin Usage

This article details Chain Home's OLAP platform architecture and Kylin usage, covering the evolution from early ROLAP to MOLAP multi-dimensional engine, Kylin's basic principles, platform structure, application scenarios, usage specifications, capability extensions, and middleware development.

Apache KylinBig DataChain Home

0 likes · 11 min read

Chain Home's OLAP Platform and Kylin Usage

dbaplus Community

Apr 24, 2018 · Databases

Scaling Baidu’s TSDB to Trillions of Points: Elastic, High‑Performance Architecture

Baidu’s TSDB processes over 20 million data points per second per node and tens of thousands of queries per second cluster‑wide by employing a stateless read/write‑separated elastic architecture, multi‑layer storage across Redis, HBase and Hadoop, minute‑level geo‑redundant self‑healing, and a modified Gorilla compression that cuts storage by 80% with minimal CPU overhead.

Big DataTSDBTime Series Database

0 likes · 8 min read

Scaling Baidu’s TSDB to Trillions of Points: Elastic, High‑Performance Architecture

dbaplus Community

Apr 23, 2018 · Operations

Insights and Highlights from the 2018 Gdevops Global Agile Ops Summit

The 2018 Gdevops Global Agile Operations Summit in Chengdu gathered industry experts who shared practical insights on AIOps implementation, sharding database ecosystems, DevOps adoption in traditional enterprises, large‑scale data management, ElasticSearch clustering, AWS blue‑green deployments, cloud database operations, Alibaba's double‑11 ops platform, 58 delivery mini‑program architecture, and scalable game service design.

Big DataDevOpsaiops

0 likes · 13 min read

Insights and Highlights from the 2018 Gdevops Global Agile Ops Summit

Python Crawling & Data Mining

Apr 22, 2018 · Big Data

How to Set Up CDH 5.14 on CentOS 6.7: Complete Offline Installation Guide

This guide details the required system environment, download sources, and step‑by‑step offline file preparation for installing Cloudera's CDH 5.14 on a CentOS 6.7 server, including JDK, parcels, and MySQL connector setup.

Big DataCDHCentOS

0 likes · 4 min read

How to Set Up CDH 5.14 on CentOS 6.7: Complete Offline Installation Guide

ITFLY8 Architecture Home

Apr 19, 2018 · Information Security

How Suning Built a Comprehensive Information Security Architecture

This article outlines Suning's evolution from a basic network operations unit to a sophisticated, multi‑layered security architecture that integrates organizational structure, protection platforms, risk management, big‑data threat perception, and continuous improvement to safeguard e‑commerce operations.

Big DataInformation SecuritySecurity Architecture

0 likes · 10 min read

How Suning Built a Comprehensive Information Security Architecture

Architecture Digest

Apr 19, 2018 · Cloud Computing

Understanding the Relationship Between Cloud Computing, Big Data, and Artificial Intelligence

This article explains how cloud computing, big data, and artificial intelligence are interrelated, describing the evolution from physical resource management to virtualized, elastic services, the roles of IaaS, PaaS, and SaaS, and how each technology benefits the others in modern applications.

Artificial IntelligenceBig DataCloud Computing

0 likes · 36 min read

Understanding the Relationship Between Cloud Computing, Big Data, and Artificial Intelligence

UCloud Tech

Apr 18, 2018 · Big Data

How Elasticsearch Powers Billion‑Record Log Analysis and Full‑Text Search

This article explains how Elasticsearch and the ELK stack address challenges of storing, securing, retrieving, and analyzing massive data volumes by providing distributed real‑time search, log collection, visualization, and even serving as a NoSQL alternative for large‑scale applications.

Big DataELKElasticsearch

0 likes · 7 min read

How Elasticsearch Powers Billion‑Record Log Analysis and Full‑Text Search

Architecture Digest

Apr 18, 2018 · Databases

Understanding Distributed Architecture and Its Applications in MySQL and Large‑Scale Systems

The article explains the concept of distributed architecture, its key characteristics such as cohesion and transparency, showcases how MySQL and middleware like Mycat are used in e‑commerce platforms, and outlines the evolution, practical implementations, and challenges of building scalable distributed database systems.

Big DataDatabase ArchitectureDistributed Systems

0 likes · 15 min read

Understanding Distributed Architecture and Its Applications in MySQL and Large‑Scale Systems

Huawei Cloud Developer Alliance

Apr 10, 2018 · Big Data

Unlock Big Data Expertise with Huawei’s HCDA‑BigData V1.0 Certification

Huawei’s HCDA‑BigData V1.0 certification, launched in April 2018, offers engineer‑level credentials, comprehensive training materials, a four‑day hands‑on program, and a Prometric exam to equip developers, partners, students, and ICT professionals with advanced big‑data development skills on FusionInsight.

Big DataFusionInsightHCDA

0 likes · 5 min read

Unlock Big Data Expertise with Huawei’s HCDA‑BigData V1.0 Certification

Qunar Tech Salon

Apr 10, 2018 · Big Data

Design and Implementation of Meituan's Traffic Compass Data Warehouse for Hotel‑Travel Business

The article presents Meituan's Traffic Compass—a data‑warehouse‑driven traffic analysis platform for the hotel‑travel business—detailing its background, challenges, architectural layers, dimensional modeling, Kylin‑based query engine, configuration mechanisms, performance metrics, and future optimization plans.

AnalyticsBig DataKylin

0 likes · 14 min read

Design and Implementation of Meituan's Traffic Compass Data Warehouse for Hotel‑Travel Business

Qunar Tech Salon

Apr 9, 2018 · Big Data

Analysis of Apache Spark 2.2.1 Memory Management Model

This article examines Spark's unified memory manager in version 2.2.1, detailing on‑heap and off‑heap memory regions, the four on‑heap memory pools, dynamic execution‑storage memory sharing, task memory accounting, and provides concrete calculation examples to explain UI discrepancies and runtime memory limits.

Big DataExecutorMemory Management

0 likes · 13 min read

Analysis of Apache Spark 2.2.1 Memory Management Model

dbaplus Community

Apr 3, 2018 · Big Data

How Meituan Built DataMan: A Scalable Data Quality Monitoring Platform for Big Data

This article details Meituan's DataMan platform, describing the background of data quality challenges, the eight-step PDCA-driven solution, architectural design, technical stack, monitoring standards, and the resulting improvements in data governance and operational efficiency across their massive data warehouse ecosystem.

Big DataData GovernanceData Quality

0 likes · 20 min read

How Meituan Built DataMan: A Scalable Data Quality Monitoring Platform for Big Data

StarRing Big Data Open Lab

Mar 30, 2018 · Operations

How Milano Transforms Large-Scale Cluster Log Analysis with ELK and Kafka

Milano, a distributed log collection and analysis platform built on the ELK stack, leverages Filebeat, Kafka, Logstash, Elasticsearch, and Kibana to provide high‑throughput, low‑latency, secure, and visual log management for massive clusters, addressing the challenges of traditional manual log inspection.

Big DataDistributed SystemsELK

0 likes · 8 min read

How Milano Transforms Large-Scale Cluster Log Analysis with ELK and Kafka

ITPUB

Mar 29, 2018 · Big Data

Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture

This article explains Hadoop’s core components, the MapReduce programming model, the detailed shuffle and merge processes, and how YARN replaces the classic JobTracker/TaskTracker design to improve scalability and resource utilization in large‑scale data processing clusters.

Big DataHadoopMapReduce

0 likes · 15 min read

Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture

Architecture Digest

Mar 26, 2018 · Operations

Alipay’s Double 11 Architecture: Logical Data Centers, Distributed Transactions, and High‑Availability Strategies

The article details Alipay’s comprehensive architecture for the Double 11 shopping festival, covering its three‑layer IAAS/PAAS/SAAS model, logical data‑center design, multi‑active disaster‑recovery, blue‑green deployment, distributed data sharding, transaction processing, and the Ant Credit Pay service’s performance and risk‑control mechanisms.

AlipayArchitectureBig Data

0 likes · 16 min read

Alipay’s Double 11 Architecture: Logical Data Centers, Distributed Transactions, and High‑Availability Strategies

StarRing Big Data Open Lab

Mar 23, 2018 · Databases

Mastering HBase Ops: Essential Tools and Commands for Cluster Management

This guide introduces the most commonly used HBase operational tools—including Canary, hbck, HFile viewer, CopyTable, Export/Import, ImportTsv, CompleteBulkload, RowCounter, CellCounter, and clean utilities—explaining their purposes, typical use‑cases, and exact command syntax for effective cluster administration.

Big DataHBasecommands

0 likes · 12 min read

Mastering HBase Ops: Essential Tools and Commands for Cluster Management

MaGe Linux Operations

Mar 23, 2018 · Cloud Computing

Why Cloud Computing, Big Data, and AI Are Inseparable: A Beginner’s Guide

This article explains the origins and goals of cloud computing, how virtualization adds flexibility in time and space, the evolution from physical servers to public and private clouds, the role of IaaS, PaaS, and SaaS, and how big data and artificial intelligence intertwine with cloud services to enable modern intelligent applications.

Artificial IntelligenceBig DataCloud Computing

0 likes · 37 min read

Why Cloud Computing, Big Data, and AI Are Inseparable: A Beginner’s Guide

Meituan Technology Team

Mar 22, 2018 · Big Data

High-Performance User Behavior Analysis Solution for Massive Data

The paper describes a high‑performance user‑behavior analysis system that processes hundreds of billions of daily logs for Meituan‑Dianping, using an inverted‑index structure with bitmap UUID sets and timestamp sequences, combined with Spark, Spring and Alluxio optimizations to cut query times from hours to under five seconds.

Big DataOLAP analysisdistributed computing

0 likes · 14 min read

High-Performance User Behavior Analysis Solution for Massive Data

dbaplus Community

Mar 20, 2018 · Big Data

How to Upgrade Hive from 0.13 to 2.1 Without Downtime: Tips, Pitfalls, and Best Practices

This article walks through a gray‑scale, controlled upgrade of Hive from version 0.13 to 2.1, covering metadata schema analysis, syntax compatibility, new Hive‑2.1 features, UDF adjustments, performance tweaks, and a step‑by‑step procedure to ensure stability and zero service interruption.

Big DataPerformanceSQL Compatibility

0 likes · 20 min read

How to Upgrade Hive from 0.13 to 2.1 Without Downtime: Tips, Pitfalls, and Best Practices

dbaplus Community

Mar 18, 2018 · Databases

What’s New in the Database World? March 2018 Release Roundup

The March 2018 DBAplus Newsletter compiles the latest releases across RDBMS, NoSQL, NewSQL, time‑series and big‑data ecosystems, highlighting new features, performance improvements, compatibility updates and key technical links for Oracle, MySQL, MariaDB, SQL Server, DB2, PostgreSQL, TiDB, CockroachDB, InfluxDB, Hadoop and several Chinese‑made databases.

Big DataDatabase ReleasesHTAP

0 likes · 21 min read

What’s New in the Database World? March 2018 Release Roundup

MaGe Linux Operations

Mar 17, 2018 · Operations

From Manual Ops to Automated Cloud: A 7‑Year Journey of a Game Ops Team

This article chronicles a game company's operations team evolution over seven years, detailing how it grew from a tiny manual crew to a large, automated, cloud‑native organization that built its own CDN, monitoring, and platform solutions while tackling scaling, reliability, and service‑orientation challenges.

Big DataCDNCloud Computing

0 likes · 21 min read

From Manual Ops to Automated Cloud: A 7‑Year Journey of a Game Ops Team

Architecture Digest

Mar 14, 2018 · Big Data

Attributes Matrix and Data Flow Models of Apache Streaming Platforms

This article presents a comprehensive attributes matrix and data‑flow model overview for major Apache streaming platforms, comparing versions, sponsors, event handling, fault tolerance, processing order, latency, resource management, APIs, and supported connectors to aid practical technology selection.

ApacheBig Dataattributes matrix

0 likes · 16 min read

Attributes Matrix and Data Flow Models of Apache Streaming Platforms

Beike Product & Technology

Mar 9, 2018 · Big Data

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

This article details Lianjia's journey of designing and implementing a low‑latency, stable real‑time computing platform using Spark Streaming on YARN, covering technical selection, architecture components, version compatibility challenges, exactly‑once semantics, graceful shutdown, Kafka tuning, and future enhancements.

Big DataExactly-OnceKafka

0 likes · 11 min read

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

Beike Product & Technology

Mar 9, 2018 · Big Data

Design and Implementation of Transparent Compression for Hadoop Using ZFS

The article presents a comprehensive solution for reducing Hadoop cluster storage consumption by applying ZFS‑based transparent compression and data‑governance techniques, detailing the technical background, design choices, implementation steps, performance optimizations, and observed storage savings.

Big DataData GovernanceHadoop

0 likes · 12 min read

Design and Implementation of Transparent Compression for Hadoop Using ZFS

Suning Technology

Mar 9, 2018 · Big Data

How Suning Built a Scalable Real-Time Log Analysis Platform with Spark Streaming

Suning’s real‑time log analysis system integrates Flume, Kafka, Storm and Spark Streaming to collect, cleanse, and compute metrics like NDCG, ensuring low latency, high throughput, exact‑once processing, and robust data safety while supporting multi‑dimensional analytics on massive online‑offline traffic.

Big DataData QualityNDCG

0 likes · 12 min read

How Suning Built a Scalable Real-Time Log Analysis Platform with Spark Streaming

Qunar Tech Salon

Mar 9, 2018 · Big Data

New Features in Apache Spark 2.3: Continuous Streaming, Kubernetes Scheduler, Pandas UDFs, and MLlib Enhancements

Apache Spark 2.3 introduces major upgrades such as millisecond‑latency continuous streaming, stream‑to‑stream joins, a native Kubernetes scheduler backend, accelerated Pandas UDFs, and several MLlib improvements, all aimed at making big‑data processing faster, easier, and smarter.

Apache SparkBig DataContinuous Processing

0 likes · 7 min read

New Features in Apache Spark 2.3: Continuous Streaming, Kubernetes Scheduler, Pandas UDFs, and MLlib Enhancements

Ctrip Technology

Mar 8, 2018 · Big Data

Ctrip Wireless APM Platform: Architecture, Metrics, and Technical Details

The article describes the evolution of Ctrip's wireless APM platform from the early UBT-based monitoring to a globally‑oriented, metric‑rich system that processes over 100 billion data points daily using Storm and Elasticsearch, detailing its design, key performance dimensions, data‑volume trade‑offs, and implementation choices.

APMBig DataCtrip

0 likes · 12 min read

Ctrip Wireless APM Platform: Architecture, Metrics, and Technical Details

dbaplus Community

Mar 7, 2018 · Big Data

Taming Massive HDFS Data Growth: Monitoring, Capacity Planning & Hive Optimization

The article outlines a systematic approach for large‑scale Hadoop clusters to monitor daily data growth, identify abnormal paths, manage rapid expansion, clean unused cold data, and implement capacity forecasts, while providing concrete daily and quarterly actions, Hive‑specific strategies, and practical examples to keep storage under control.

Big DataData GrowthHDFS

0 likes · 17 min read

Taming Massive HDFS Data Growth: Monitoring, Capacity Planning & Hive Optimization

Suning Technology

Mar 6, 2018 · Big Data

Why China Needs Open Big Data Platforms: Insights from Representative Zhang Jindong

The article examines China’s push for big data sharing, highlighting Representative Zhang Jindong’s calls for a national open data platform, integration of AI and blockchain, and urgent legislation to address data fragmentation, security risks, and the need to empower SMEs against data monopolies.

Artificial IntelligenceBig Datadata security

0 likes · 7 min read

Why China Needs Open Big Data Platforms: Insights from Representative Zhang Jindong

StarRing Big Data Open Lab

Mar 2, 2018 · Cloud Computing

How Cloud, Big Data, and AI Converge to Transform Enterprise Data Strategies

The article explores how the integration of cloud computing, big data, and artificial intelligence is reshaping enterprise data platforms, outlining a multi‑stage evolution from data unification to ecosystem building and forecasting the strategic importance of data in future business transformation.

Artificial IntelligenceBig DataEnterprise Data

0 likes · 9 min read

How Cloud, Big Data, and AI Converge to Transform Enterprise Data Strategies

Ctrip Technology

Feb 28, 2018 · Big Data

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

The article explains how Ctrip's big‑data platform introduced Alluxio to isolate real‑time Spark Streaming jobs from HDFS NameNode maintenance, reduce NameNode pressure, improve Spark SQL performance, and provide a unified storage layer across multiple HDFS clusters.

AlluxioBig DataData Lake

0 likes · 9 min read

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

Hulu Beijing

Feb 28, 2018 · Big Data

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

This article introduces Hulu's in‑house OLAP engine Nesto, detailing its near‑real‑time data ingestion, nested data model, TB‑level storage using HBase and Parquet, MPP query execution, custom predicate library, and the overall architecture that enables sub‑second ad‑hoc queries for user analytics.

Big DataColumnar StorageDistributed Systems

0 likes · 22 min read

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

JD Tech

Feb 28, 2018 · Operations

CallGraph: JD.com's Distributed Tracing and Service Governance Platform

CallGraph is JD.com's internally developed distributed tracing and service governance platform that addresses the challenges of monitoring complex microservice architectures by providing low‑intrusion, low‑latency tracing, real‑time analytics, configurable sampling, and integration with JMQ, Storm, Spark, HBase, and JimDB for both operational insight and performance optimization.

Big DataDistributed TracingMicroservices

0 likes · 12 min read

CallGraph: JD.com's Distributed Tracing and Service Governance Platform

21CTO

Feb 20, 2018 · Big Data

Why Real-Time Streaming Is the Next Big Data Revolution for Developers

This article explains how real-time streaming has evolved from batch Hadoop systems through Lambda architecture to modern Kappa-style pipelines, highlighting its growing importance for developers, enterprises, and the integration of streaming with microservices, AI, and cloud-native technologies.

AI integrationBig DataKappa architecture

0 likes · 8 min read

Why Real-Time Streaming Is the Next Big Data Revolution for Developers

Architecture Digest

Feb 11, 2018 · Artificial Intelligence

Recent Advances in Bayesian Machine Learning: Foundations, Non‑Parametric Methods, and Large‑Scale Applications

This article reviews recent progress in Bayesian machine learning, covering foundational theory, non‑parametric approaches such as Dirichlet and Indian buffet processes, regularized Bayesian inference, and scalable techniques for big‑data environments including stochastic variational methods, distributed algorithms, and hardware acceleration.

Big DataMonte CarloVariational Inference

0 likes · 23 min read

Recent Advances in Bayesian Machine Learning: Foundations, Non‑Parametric Methods, and Large‑Scale Applications

Java Backend Technology

Feb 6, 2018 · Artificial Intelligence

How JD Built a Scalable AI-Powered Recommendation Engine for E‑Commerce

This article details JD's evolution from rule‑based recommendations to a multi‑screen, AI‑driven personalization platform, describing its system architecture, data pipelines, feature services, and key technologies that enable real‑time, user‑centric product suggestions across the e‑commerce ecosystem.

Artificial IntelligenceBig Datae‑commerce

0 likes · 20 min read

How JD Built a Scalable AI-Powered Recommendation Engine for E‑Commerce

Meituan Technology Team

Feb 2, 2018 · Big Data

How Meituan’s “Flow Compass” Turns Massive User Data into Actionable Insights

This article details the design, challenges, and implementation of Meituan’s Flow Compass—a data‑driven product that combines user, scene, and traffic source dimensions using a Kylin‑based warehouse to enable rapid, flexible traffic‑source analysis for hotel‑travel growth.

Big DataETLKylin

0 likes · 19 min read

How Meituan’s “Flow Compass” Turns Massive User Data into Actionable Insights

Architecture Digest

Feb 1, 2018 · Fundamentals

How Search Engines Work: Building Inverted Indexes

This article explains the core of search engine technology by describing what an inverted index is, how it is built using single‑pass memory and multi‑way merge methods, how indexes can be partitioned and incrementally updated, and how Hadoop can be used for large‑scale indexing.

Big DataHadoopindexing

0 likes · 10 min read

How Search Engines Work: Building Inverted Indexes

iQIYI Technical Product Team

Jan 31, 2018 · Big Data

Evolution of iQIYI Real-Time Big Data Collection System

iQIYI’s big‑data collection system has progressed from simple HTTP log uploads to a Flume‑Kafka pipeline and finally to a custom Venus‑Agent architecture with centralized configuration, persistent offsets, dual‑Kafka streams and Flink processing, now handling tens of millions of queries per second and over three hundred billion records daily to power its AI‑driven services.

Big DataFlinkFlume

0 likes · 15 min read

Evolution of iQIYI Real-Time Big Data Collection System

Meituan Technology Team

Jan 26, 2018 · Big Data

Design and Implementation of a Real-Time Data Processing System at Meituan

Meituan designed a Storm‑based real‑time data processing platform that guarantees at‑least‑once delivery and high availability, employs a custom spout, regression‑driven traffic smoothing, and a low‑latency KV store with atomic operations, persisting results in Kafka, MySQL and Cellar to power merchant dashboards and heat‑tag analytics, while planning broader real‑time analytics expansion.

Big DataDistributed SystemsStorm

0 likes · 10 min read

Design and Implementation of a Real-Time Data Processing System at Meituan

Java Backend Technology

Jan 23, 2018 · Big Data

Which Tech Fields Paid the Most in 2017? Insights from China’s Internet Talent Market

A 2017 analysis of China’s high‑end internet talent market reveals that cloud computing, big data and gaming lead salary rankings, while company financing stage, job role demand, city location and candidate age all shape hiring trends and compensation levels.

Big DataChinaCloud Computing

0 likes · 9 min read

Which Tech Fields Paid the Most in 2017? Insights from China’s Internet Talent Market

21CTO

Jan 22, 2018 · Big Data

Solr vs Elasticsearch: Which Open‑Source Search Engine Wins for Scalable Data Retrieval?

This article introduces Apache Solr and Elasticsearch, explains their shared Lucene foundation, compares their scalability, deployment ease, and key features, and highlights why Elasticsearch is often considered more efficient for large‑scale, multi‑tenant search applications.

Big DataElasticsearchSolr

0 likes · 5 min read

Solr vs Elasticsearch: Which Open‑Source Search Engine Wins for Scalable Data Retrieval?

Huawei Cloud Developer Alliance

Jan 18, 2018 · Big Data

Smart Flood Control: Donghua Software’s IoT, Big Data & Cloud Solution

The case study details Donghua Software’s smart flood‑control and drainage solution, which integrates IoT sensors, NB‑IoT/eLTE networks, Huawei’s FusionSphere cloud platform, big‑data analytics, and GIS to provide real‑time monitoring, predictive warnings, automated gate control, and efficient emergency dispatch for urban water management.

Big DataCloud ComputingFlood Management

0 likes · 12 min read

Smart Flood Control: Donghua Software’s IoT, Big Data & Cloud Solution

Efficient Ops

Jan 16, 2018 · Operations

How Tencent Secures Game Operations: Real Cases, Challenges, and Data‑Driven Solutions

This article shares a comprehensive overview of game operation security at Tencent, covering personal background, real‑world incident cases, the inherent challenges of large‑scale game services, past monitoring efforts, and a new data‑driven alerting framework that dramatically reduces false alarms while protecting game economies.

AlertingBig DataGame Security

0 likes · 25 min read

How Tencent Secures Game Operations: Real Cases, Challenges, and Data‑Driven Solutions

Huawei Cloud Developer Alliance

Jan 16, 2018 · Artificial Intelligence

How to Build a Scalable Spark-Based Text Sentiment Analysis System

This article walks through constructing a Spark-powered text sentiment analysis pipeline—from crawling movie reviews, preprocessing and feature extraction with jieba and TF‑IDF, to training Naive Bayes and SVM classifiers—while discussing Spark's advantages and ways to improve model accuracy.

Big DataNLPPython

0 likes · 19 min read

How to Build a Scalable Spark-Based Text Sentiment Analysis System

ITFLY8 Architecture Home

Jan 15, 2018 · Backend Development

Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More

This article surveys the technical architectures of major web platforms—including Wikipedia, Facebook, Yahoo! Mail, Twitter, Google App Engine, Amazon, and Youku—highlighting their design patterns, scaling techniques, storage solutions, and caching strategies to reveal how massive online services are built and operated.

ArchitectureBackendBig Data

0 likes · 10 min read

Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More

JD Retail Technology

Jan 15, 2018 · R&D Management

Comprehensive System Quality Assurance Practices and Team Development in Rapid Iteration Environments

The article presents Zhang Qi's insights on end‑to‑end quality assurance for JD's POP platform and 7FRESH stores, covering pre‑, during‑, and post‑release testing measures, big‑data‑driven automation, tester roles, team growth strategies, and knowledge‑sharing initiatives.

Big DataQuality assuranceautomation

0 likes · 10 min read

Comprehensive System Quality Assurance Practices and Team Development in Rapid Iteration Environments

StarRing Big Data Open Lab

Jan 5, 2018 · Big Data

What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions

Analyzing 2017’s big data boom, the article explores how the 4V characteristics—volume, variety, velocity, and value—spurred innovations like distributed storage, NoSQL, real‑time stream processing, and AI integration, and predicts future hotspots such as SQL resurgence, cloud‑based platforms, and AI‑driven analytics.

Artificial IntelligenceBig DataReal-time Processing

0 likes · 11 min read

What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions

AntTech

Jan 4, 2018 · Databases

Report on VLDB 2017 Conference: Insights and Highlights from Database Research

Attending VLDB 2017 in Munich, the report summarizes the conference’s broad coverage of database research—from new hardware‑accelerated prototypes and Spark‑based big‑data processing to Oracle and SAP HANA case studies, keynotes, notable papers, and reflections on industry trends and Chinese contributions.

Big DataHardware accelerationVLDB

0 likes · 22 min read

Report on VLDB 2017 Conference: Insights and Highlights from Database Research

dbaplus Community

Jan 1, 2018 · Big Data

How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops

This article summarizes Wu Xiaoguang's talk at Gdevops 2017, detailing how Vipshop integrates data processing, analysis, and mining technologies—such as Flume, Kafka, Spark, and custom scheduling—to improve operational decision‑making, performance monitoring, root‑cause analysis, and predictive modeling across its e‑commerce platform.

Big DataData AnalyticsOperations

0 likes · 23 min read

How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops

Tencent Architect

Dec 30, 2017 · Databases

An Overview of Time Series Databases and Tencent CTSDB

This article introduces the concept, characteristics, and use cases of time series databases, explains the data model and challenges of traditional solutions, and provides a detailed overview of Tencent's Cloud Time Series Database (CTSDB) along with performance comparisons against InfluxDB.

Big DataCTSDBTime Series Database

0 likes · 12 min read

An Overview of Time Series Databases and Tencent CTSDB

Architects' Tech Alliance

Dec 28, 2017 · Operations

Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan

In this lecture, Prof. Pei Dan of Tsinghua University outlines the evolution of intelligent operations from rule‑based automation to machine‑learning‑driven AIOps, discusses data, feedback loops, and practical challenges, and calls for stronger collaboration between industry and academia to accelerate research and deployment.

Big DataCloud Computingaiops

0 likes · 10 min read

Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan

Meituan Technology Team

Dec 28, 2017 · Big Data

Design and Implementation of a Scalable Scenario Query System for Meituan

Meituan built a scalable scenario‑query platform that unifies traffic, activity and investment data by layering RPC services, a Storm‑driven pre‑computation tree stored in Redis/Tair, and a middle‑platform API with circuit‑breaker logic, cutting response times from seconds to under one second while dramatically reducing code coupling and simplifying future feature development.

Apache StormBig DataNoSQL

0 likes · 12 min read

Design and Implementation of a Scalable Scenario Query System for Meituan

Architecture Digest

Dec 27, 2017 · Backend Development

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

This article explores how distributed systems determine node liveness, manage failover and recovery, and implement at‑most‑once, at‑least‑once, and exactly‑once processing guarantees—including opaque transactions and two‑phase commit—using examples from Kafka, Zookeeper, and big‑data pipelines.

Big DataDistributed SystemsExactly-Once

0 likes · 15 min read

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

dbaplus Community

Dec 26, 2017 · Big Data

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

This article explains how the open‑source DBus platform, combined with the Wormhole streaming engine, captures raw application logs, lets users configure visual rule operators, and transforms the unstructured message part into schema‑driven, Kafka‑ready data for downstream analytics.

Big DataDBusLog Processing

0 likes · 15 min read

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

Architecture Digest

Dec 22, 2017 · Big Data

Redesign and Optimization of the WeChat Pay Transaction Record System

This article presents a comprehensive case study of how WeChat Pay rebuilt its transaction record storage system to handle massive data volumes, improve performance, ensure data completeness, support flexible queries, and strengthen security through distributed key‑value storage, data partitioning, and operational safeguards.

Big DataData PartitioningWeChat Pay

0 likes · 11 min read

Redesign and Optimization of the WeChat Pay Transaction Record System

Qunar Tech Salon

Dec 21, 2017 · Big Data

Experience and Optimization Strategies for Apache Kylin in Real-Time OLAP

This article shares a data engineer's three‑year experience using Apache Kylin for real‑time OLAP on petabyte‑scale data, describing the business background, challenges of pre‑computation, cube modeling, dimension reduction, and various optimization techniques such as hierarchy, mandatory, and joint dimensions, as well as precise count‑distinct handling.

Apache KylinBig DataCube Optimization

0 likes · 13 min read

Experience and Optimization Strategies for Apache Kylin in Real-Time OLAP

Meitu Technology

Dec 19, 2017 · Big Data

How Meitu Built a Scalable Distributed Bitmap System for Massive Data Processing

This article explains Meitu's development of a distributed bitmap system that leverages the speed and storage efficiency of bitmap structures to handle massive user data, detailing its evolution, architectural choices, implementation practices, and lessons learned to inspire similar big‑data solutions.

Big DataMeituSystem Design

0 likes · 3 min read

How Meitu Built a Scalable Distributed Bitmap System for Massive Data Processing

Meitu Technology

Dec 19, 2017 · Industry Insights

Inside Meitu’s In‑House Log Collection System Arachnia: Design, Challenges, and Core Mechanisms

This article introduces Meitu’s self‑developed log collection system Arachnia, explaining why a custom solution was needed for massive server‑side user‑behavior logs, the key requirements such as reliability and real‑time throughput, and the core architectural mechanisms that address those challenges.

ArachniaBig DataMeitu

0 likes · 2 min read

Inside Meitu’s In‑House Log Collection System Arachnia: Design, Challenges, and Core Mechanisms

Meitu Technology

Dec 19, 2017 · Big Data

Meitu Internet Technology Salon Session 7: Practices in Recommendation Algorithms, Big Data, and Personalized Recommendation

At Meitu’s seventh Internet Technology Salon in Xiamen, over a hundred experts discussed recommendation algorithms and big‑data solutions, with talks on the Arachnia log‑collection system, the Naix distributed bitmap service, Meitu’s personalized recommendation pipeline challenges, and novel data‑missing‑theory models for improved performance.

Big Datadata collectiondistributed bitmap

0 likes · 8 min read

Meitu Internet Technology Salon Session 7: Practices in Recommendation Algorithms, Big Data, and Personalized Recommendation

Suning Technology

Dec 19, 2017 · Big Data

How Leading Tech Giants Leverage Big Data: Highlights from Suning’s Data‑Driven Forum

The Suning Cloud Commerce Data‑Driven Forum gathered top experts from Intel, Mobike, Google, and others to share deep insights on big data, AI, and data‑driven retail, offering practical case studies, emerging technologies, and strategic perspectives for the industry’s digital transformation.

AIBig DataData-driven

0 likes · 8 min read

How Leading Tech Giants Leverage Big Data: Highlights from Suning’s Data‑Driven Forum

Architecture Digest

Dec 16, 2017 · Big Data

Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing

This report presents a systematic performance evaluation of Apache Flink and Apache Storm across multiple real‑time processing scenarios, measuring throughput, latency, message‑delivery semantics, and state‑backend effects, and provides recommendations for selecting the most suitable engine based on the observed results.

Big DataFlinkReal-time analytics

0 likes · 21 min read

Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing

58 Tech

Dec 15, 2017 · Big Data

Design and Architecture of WMDA: A Comprehensive User Behavior Analysis Platform

The article details WMDA, a no‑code and manual‑code data collection platform for PC, mobile and app that supports real‑time and offline user behavior analysis, describing its functional model, behavior taxonomy, five‑layer architecture, tracking techniques, circle‑selection, data services, streaming and batch processing pipelines, and related technologies such as Storm, Spark, Druid and Roaring Bitmap.

Big DataDruidReal-time Streaming

0 likes · 18 min read

Design and Architecture of WMDA: A Comprehensive User Behavior Analysis Platform

Alibaba Cloud Infrastructure

Dec 15, 2017 · Operations

Automated Fault Recovery Architecture for Alibaba's Network during Double Eleven

The article describes Alibaba's end‑to‑end automated fault recovery system for its massive network, covering extensive data collection, Spark‑based event processing, flexible alerting with Siddhi, alert convergence using PageRank, and scripted recovery actions to achieve high availability during the Double Eleven traffic surge.

Big DataNetwork MonitoringOperations

0 likes · 9 min read

Automated Fault Recovery Architecture for Alibaba's Network during Double Eleven

dbaplus Community

Dec 14, 2017 · Big Data

Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

In 2017 Vipshop’s senior big‑data architect shares how the company grew its Hadoop‑based platform from zero to a thousand‑node cluster, detailing cluster health monitoring, multi‑HDFS deployment via Hive, Yarn container allocation improvements, and a hook‑driven Capping resource‑control system to boost stability and efficiency.

Big DataHDFScapping

0 likes · 15 min read

Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

Qunar Tech Salon

Dec 14, 2017 · Databases

TiDB Architecture, Deployment, and Monitoring Practices at Qunar

This article explains Qunar's transition from MySQL, Redis, and HBase to TiDB, detailing the background of distributed databases, TiDB's architecture, hardware selection, deployment automation, monitoring setup, and real‑world usage scenarios to address scalability and high‑availability challenges.

Big DataDatabase ArchitectureDeployment

0 likes · 14 min read

TiDB Architecture, Deployment, and Monitoring Practices at Qunar

Huawei Cloud Developer Alliance

Dec 11, 2017 · Artificial Intelligence

How AI and Big Data Are Transforming Urban Traffic Management

The 2017 12th China Intelligent Transportation Conference highlighted system thinking, AI, and innovation as key drivers for smarter city traffic, outlining a three‑step top‑level design, AI‑powered applications, and intersection innovations that together promise safer, more efficient, and fully automated urban mobility.

AIBig DataIntelligent Transportation

0 likes · 8 min read

How AI and Big Data Are Transforming Urban Traffic Management

AntTech

Dec 11, 2017 · Artificial Intelligence

How AI and Big Data Transform the Insurance Industry: Differentiated Pricing, Smart Claims, Risk Control, and Operations

The article examines how emerging AI and big‑data technologies are reshaping insurance by enabling differentiated pricing, automating claims and customer service, strengthening fraud detection, and improving personalized product recommendation and operational efficiency across the sector.

Artificial IntelligenceBig DataBlockchain

0 likes · 13 min read

How AI and Big Data Transform the Insurance Industry: Differentiated Pricing, Smart Claims, Risk Control, and Operations

Efficient Ops

Dec 7, 2017 · Operations

How Multi-Dimensional Root Cause Analysis Boosts Monitoring Efficiency with AI

This article introduces the challenges of multi-dimensional monitoring, explains the limitations of traditional alerting, and presents the MDRCA algorithm—combining K‑means clustering, Explanatory Power, and Surprise metrics—to pinpoint root causes efficiently, while sharing practical AI integration experiences for large‑scale monitoring platforms.

AIBig DataKMeans

0 likes · 15 min read

How Multi-Dimensional Root Cause Analysis Boosts Monitoring Efficiency with AI

MaGe Linux Operations

Dec 3, 2017 · Big Data

Build a Simple Big Data Search Engine with Bloom Filters and Tokenization in Python

This article walks through implementing a basic big‑data search system in Python, covering Bloom filter basics, tokenization of text, inverted index construction, and how to combine these techniques to support fast AND/OR queries.

Big DataPythonbloom-filter

0 likes · 13 min read

Build a Simple Big Data Search Engine with Bloom Filters and Tokenization in Python

Meituan Technology Team

Dec 1, 2017 · Big Data

Metric Logic Tree: Automated Anomaly Analysis for Business Metrics

The Metric Logic Tree automates business metric anomaly analysis by integrating heterogeneous data sources (Kylin, MySQL, Elasticsearch, Druid) with a three‑layer architecture—metric calculation, algorithmic analysis (waterfall and Gini‑coefficient methods), and a master‑worker computation service—that parallelizes queries, delivers immediate conclusions, and shortens decision cycles, as demonstrated in Meituan‑Dianping’s hotel‑travel operations.

Big Dataalgorithmanomaly detection

0 likes · 7 min read

Metric Logic Tree: Automated Anomaly Analysis for Business Metrics

AntTech

Dec 1, 2017 · Big Data

Insights and Paper Summaries from KDD 2017 Conference

The article provides a comprehensive overview of KDD 2017, including acceptance statistics, best paper awards, Ant Group's contributions, detailed discussions on AB testing, graph mining, and selected research papers across data mining, machine learning, and anomaly detection, offering valuable insights for practitioners and researchers.

AB testingBig DataKDD

0 likes · 30 min read

Insights and Paper Summaries from KDD 2017 Conference

Efficient Ops

Nov 27, 2017 · Operations

How Facebook Scales to Billions: Disaggregated Networks, Storage, and Warm Spark

Facebook’s journey from early startup ops to supporting over 2 billion monthly users reveals how disaggregated network, storage, and warm‑storage‑enabled Spark architectures overcome scalability bottlenecks, illustrating the operational strategies and design principles that power massive, reliable data‑center services.

Big DataDistributed SystemsOperations

0 likes · 12 min read

How Facebook Scales to Billions: Disaggregated Networks, Storage, and Warm Spark

Java Backend Technology

Nov 24, 2017 · Big Data

Top 6 Data Ingestion Platforms: Flume, Fluentd, Logstash, and More

This article reviews six popular data collection platforms—Apache Flume, Fluentd, Logstash, Chukwa, Scribe, and Splunk Forwarder—explaining their architectures, strengths, and typical use cases within modern big‑data pipelines.

Apache FlumeBig DataFluentd

0 likes · 10 min read

Top 6 Data Ingestion Platforms: Flume, Fluentd, Logstash, and More

iQIYI Technical Product Team

Nov 24, 2017 · Information Security

Risk Control System for Live Streaming: Real‑time Interception (Pluto) and Big Data Analysis (Mars)

iQIYI’s live‑stream risk‑control platform combines the real‑time interception engine Pluto with the big‑data analytics system Mars to curb black‑market registration fraud and red‑packet abuse, processing over a billion daily requests through adaptive filters, Kafka‑Spark pipelines, and clustering algorithms that now limit fake popularity to 10‑30 % and red‑packet capture to under 3 %.

Big DataMarsPluto

0 likes · 11 min read

Risk Control System for Live Streaming: Real‑time Interception (Pluto) and Big Data Analysis (Mars)

Alibaba Cloud Developer

Nov 21, 2017 · Big Data

Inside Alibaba’s Stream Computing: 4.72 B Events/sec & 25.6 K Payments/sec on Double 11

Alibaba’s Double 11 showcase reveals how its upgraded stream computing platform handled a 100% year‑over‑year data surge, achieving 256 K successful payments per second and processing 472 million events per second in real time through a highly optimized Flink‑based architecture.

AlibabaBig DataFlink

0 likes · 10 min read

Inside Alibaba’s Stream Computing: 4.72 B Events/sec & 25.6 K Payments/sec on Double 11

Suning Technology

Nov 20, 2017 · Big Data

How ZEUS Turns Monitoring Data into Automated Decisions for Enterprise Systems

ZEUS, Suning’s decision analysis platform, integrates monitoring data from tools like Baymax and HIRO, applies CEP aggregation and Drools rule evaluation, and leverages big‑data storage and machine‑learning models to automatically identify root causes, provide real‑time alerts, and enable self‑healing in large‑scale distributed systems.

Big Datadecision analysisrule engine

0 likes · 14 min read

How ZEUS Turns Monitoring Data into Automated Decisions for Enterprise Systems

StarRing Big Data Open Lab

Nov 17, 2017 · Operations

How to Seamlessly Upgrade Transwarp Data Hub Community Edition: Step‑by‑Step Guide

This guide walks you through upgrading Transwarp Data Hub Community Edition—covering application market checks, version selection, upgrade modes, required scripts, configuration parameters, pre‑upgrade checks, execution commands, and rollback procedures—to ensure a smooth, automated upgrade with minimal downtime.

Big DataTDHTranswarp

0 likes · 8 min read

How to Seamlessly Upgrade Transwarp Data Hub Community Edition: Step‑by‑Step Guide

Architects' Tech Alliance

Nov 16, 2017 · Operations

Understanding AIOps: How AI‑Driven Operations Transform IT Management

The article explains how AIOps—an AI‑powered IT operations platform that combines big‑data analytics, machine learning, and automation—revolutionizes traditional IT Ops by enabling rapid, accurate incident detection, root‑cause analysis, and self‑healing, thereby freeing CIOs to focus on strategic business value.

Big DataDigital Transformationaiops

0 likes · 8 min read

Understanding AIOps: How AI‑Driven Operations Transform IT Management

Efficient Ops

Nov 15, 2017 · Big Data

How Tencent Built a 10 TB‑Per‑Day Full‑Link Log Monitoring Platform

This article explains how Tencent's ZhiYun full‑link log monitoring platform handles massive daily logs, overcomes challenges of diverse log formats, high throughput, fault‑tolerant design, and provides scalable storage, query, and alerting capabilities for distributed micro‑service environments.

Big DataDistributed SystemsLog Monitoring

0 likes · 10 min read

How Tencent Built a 10 TB‑Per‑Day Full‑Link Log Monitoring Platform

Suning Technology

Nov 13, 2017 · Backend Development

How Suning Scaled Its Membership System for Double‑11: From Legacy POS to Multi‑Active Architecture

This article examines Suning's evolution of its membership platform—from an early offline POS system to a vertically split, cloud‑native architecture—detailing capacity planning, performance testing, data migration with Spark, multi‑active deployment, and future plans for cross‑region high availability.

Big DataCloud NativeData Migration

0 likes · 15 min read

How Suning Scaled Its Membership System for Double‑11: From Legacy POS to Multi‑Active Architecture

21CTO

Nov 11, 2017 · Big Data

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

This article explains the design and implementation of a unified seller‑operation logging platform that uses Kafka for ingestion, Storm for real‑time processing, Elasticsearch for hot‑data search, and HBase for cold‑data storage, detailing the challenges faced and the optimizations applied.

Big DataElasticsearchHBase

0 likes · 12 min read

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

ITFLY8 Architecture Home

Nov 8, 2017 · Operations

Inside Ctrip’s Evolving Architecture: Ops, Frameworks, and Big Data Insights

This article explores Ctrip’s continuously evolving architecture, detailing its three-layer composition of operations, frameworks, and applications, and examines real-world case studies of its release system, configuration management, SOA, and a massive User Profile big‑data project, highlighting key innovations and lessons learned.

Big DataCtripDeployment

0 likes · 11 min read

Inside Ctrip’s Evolving Architecture: Ops, Frameworks, and Big Data Insights

Tencent Cloud Developer

Nov 3, 2017 · Industry Insights

How Tencent Cloud’s Big Data Platform Ranked in China’s Fifth Evaluation

China’s Data Center Alliance released its fifth big‑data product evaluation, testing 17 solutions from 16 vendors across SQL, NoSQL, and machine‑learning workloads, with Tencent Cloud’s platform achieving top rankings in NoSQL tests and highlighting the nation’s push toward standardized, high‑performance big‑data infrastructure.

Big DataData PlatformsIndustry Benchmark

0 likes · 5 min read

How Tencent Cloud’s Big Data Platform Ranked in China’s Fifth Evaluation

Alibaba Cloud Developer

Nov 3, 2017 · Big Data

How Alibaba Built an EB-Scale, Real-Time Big Data Platform

Alibaba’s senior data expert Yao Bin Hui explains how the company constructed a standardized, end-to-end big-data ecosystem—from low-level data collection and AI algorithms to data services and product platforms—enabling petabyte-scale integration and second-level response times that power both internal operations and millions of external users.

AlibabaBig DataData Architecture

0 likes · 10 min read

How Alibaba Built an EB-Scale, Real-Time Big Data Platform

dbaplus Community

Oct 30, 2017 · Big Data

How to Build a Real‑Time Spam Monitoring System with Apache Storm

This article walks through the design, deployment, and code implementation of a real‑time spam detection pipeline using Apache Storm, comparing it with Hadoop, detailing cluster setup, topology components, data flow, and how to package and run the solution on a distributed Storm cluster.

Apache StormBig DataHibernate

0 likes · 13 min read

How to Build a Real‑Time Spam Monitoring System with Apache Storm

Didi Tech

Oct 30, 2017 · Big Data

Didi Launches Gaia Data Open Plan and Smart Traffic Signal Challenge to Advance Big Data and AI Research

Didi launches Gaia Data Open Plan and Smart Signal Challenge to share anonymized trajectory and OD data, provide computing resources and funding, and invite researchers and algorithm enthusiasts to develop AI-driven traffic signal optimization, fostering academic collaboration and smart city mobility solutions.

AIBig DataOpen Data

0 likes · 6 min read

Didi Launches Gaia Data Open Plan and Smart Traffic Signal Challenge to Advance Big Data and AI Research

21CTO

Oct 26, 2017 · Backend Development

From Data Platform Battles to AI Dreams: A Senior Engineer’s 3‑Year Journey at Alibaba

A senior Alibaba engineer reflects on three years of building a large‑scale data platform, tackling distributed rate‑limiting challenges, leading cross‑regional projects, and pursuing AI research, while sharing personal insights on career growth, technical problem‑solving, and the value of continuous learning.

AI learningBig DataDistributed Systems

0 likes · 11 min read

From Data Platform Battles to AI Dreams: A Senior Engineer’s 3‑Year Journey at Alibaba

Huawei Cloud Developer Alliance

Oct 23, 2017 · Artificial Intelligence

Is AI the Missing Fourth Dimension for 5G? Exploring the AI+ Revolution

The article reflects on the evolution from traditional to AI‑enhanced devices, recounts recent AI‑focused conferences, and argues that integrating AI and big data as a fourth dimension of 5G is essential for industry digital transformation and future network competitiveness.

5GAIAI+

0 likes · 9 min read

Is AI the Missing Fourth Dimension for 5G? Exploring the AI+ Revolution

Liulishuo Tech Team

Oct 22, 2017 · Big Data

Data-CI: A SQL-Based Data Unit Testing Framework for ETL

The article introduces data-ci, a SQL‑driven unit testing framework that lets engineers write, organize, and automate data validation tests for ETL pipelines, providing assertions, failure callbacks, coverage reporting, and CI integration to improve data quality and reliability.

Big DataData QualityETL

0 likes · 9 min read

Data-CI: A SQL-Based Data Unit Testing Framework for ETL

Full-Stack DevOps & Kubernetes

Oct 21, 2017 · Big Data

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

This guide walks through preparing three CentOS 6.9 nodes, configuring hostnames, time sync, password‑less SSH, disabling IPv6, installing JDK, downloading CDH 5.4, setting up core‑site and hdfs‑site XML files, formatting the NameNode, starting HDFS services, configuring YARN and MapReduce, and verifying the installations via the Web UI.

Big DataCDHCentOS

0 likes · 18 min read

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

Suning Technology

Oct 20, 2017 · Product Management

How Suning’s Biu Unmanned Store Redefines Retail Experience with AI and Design

The article details Suning’s Biu unmanned store concept, describing its AI‑driven facial recognition, cloud POS, open‑space design, big‑data‑guided shelf management and the design challenges of creating a seamless, phone‑free shopping experience that blends physical and digital realms.

AIBig DataUser experience

0 likes · 9 min read

How Suning’s Biu Unmanned Store Redefines Retail Experience with AI and Design

Efficient Ops

Oct 18, 2017 · Operations

How Bilibili Scaled Its Log System to 10TB Daily with Elastic Stack

This article details Bilibili's Billions log platform—from its fragmented origins and design goals to the elastic‑stack‑based architecture, shard management, log sampling, custom Go splitters, and monitoring enhancements—highlighting the challenges faced and the roadmap for future improvements.

Big DataElastic StackLog Management

0 likes · 17 min read

How Bilibili Scaled Its Log System to 10TB Daily with Elastic Stack