Tagged articles

3675 articles

Page 32 of 37

Nov 20, 2018 · Big Data

What Languages and Tools Do Big Data Experts Use? Insights from 31 IT Leaders

Based on interviews with 31 IT leaders from 28 organizations, this article reveals the most popular programming languages, frameworks, and platforms—such as Python, Scala, Spark, Kafka, TensorFlow, and Tableau—currently driving big‑data extraction, analysis, and reporting, and highlights emerging trends and tool preferences.

Big DataKafkaPython

0 likes · 12 min read

What Languages and Tools Do Big Data Experts Use? Insights from 31 IT Leaders

Architects' Tech Alliance

Nov 19, 2018 · Cloud Computing

Suning’s Cloud‑Era Digital Transformation: Architecture Evolution, Technology Roadmap, and Organizational Change

The article details Suning’s Internet‑plus transformation, describing its strategic “one body, two wings, three clouds, four ends” model, the evolution of its enterprise architecture across three generations, the adoption of cloud, SOA, micro‑services, big‑data and AI platforms, and the accompanying R&D and organizational reforms.

Big DataCloud ComputingDigital Transformation

0 likes · 13 min read

Suning’s Cloud‑Era Digital Transformation: Architecture Evolution, Technology Roadmap, and Organizational Change

21CTO

Nov 7, 2018 · Big Data

Why Data Streams Are the Backbone of Real-Time Big Data Analytics

Data streams, akin to endless rivers, enable continuous, real-time processing of diverse sources such as IoT telemetry, web logs, and e-commerce events, offering advantages over batch processing, while presenting challenges like scalability and fault tolerance, and are supported by tools like Kinesis, Kafka, Flink, and Storm.

Amazon KinesisApache KafkaBig Data

0 likes · 6 min read

Why Data Streams Are the Backbone of Real-Time Big Data Analytics

JD Retail Technology

Nov 7, 2018 · Cloud Computing

Technical Preparations for Double 11 Sales Event at JD.com

JD.com's commercial team conducted extensive technical preparations for the Double 11 sales event, including system optimizations, stress testing, and data platform enhancements to handle massive traffic and ensure system stability.

Big DataDouble 11 PreparationTechnical Readiness

0 likes · 6 min read

Technical Preparations for Double 11 Sales Event at JD.com

Programmer DD

Nov 7, 2018 · Big Data

Choosing the Right SQL Engine for Big Data: A Practical Guide

This article explores various SQL engines and storage options for big‑data workloads, compares their performance and capabilities, shows practical code examples, and offers guidance on writing efficient SQL in complex data environments.

Big DataSQL Enginesdata engineering

0 likes · 6 min read

Choosing the Right SQL Engine for Big Data: A Practical Guide

Xianyu Technology

Nov 6, 2018 · Big Data

Technical Evolution of Xianyu Real-Time Selection System for Double Eleven

To meet Double‑Eleven’s sub‑second, billion‑item feed demands, Alibaba’s Xianyu selection system evolved from a Solr‑based search pipeline through offline batch and PostgreSQL attempts to a Blink‑powered real‑time stream platform using Niagara’s low‑latency LSM storage, delivering high‑throughput, personalized product feeds.

AlibabaBig DataFlink

0 likes · 23 min read

Technical Evolution of Xianyu Real-Time Selection System for Double Eleven

Architects' Tech Alliance

Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataData Lake

0 likes · 16 min read

Alluxio as a Virtual Distributed File System for Data Lake Solutions

dbaplus Community

Nov 1, 2018 · Big Data

How Vipshop Scales Real‑Time Data with Flink on Kubernetes

This article details Vipshop's real‑time platform architecture, the migration from Storm and Spark to Flink, Flink's deployment on Kubernetes, and the latest Unified Data Management system that unifies data access across Kafka, Redis, Tair and HDFS.

Big DataFlinkKubernetes

0 likes · 12 min read

How Vipshop Scales Real‑Time Data with Flink on Kubernetes

Tencent Cloud Developer

Oct 30, 2018 · Big Data

Big Data Technology Trends and Cloud Data Warehouse Architecture Practices

The article reviews recent big-data trends—from Hadoop’s evolution and Spark’s in-memory advances to emerging storage like Ozone—while detailing data-warehouse models, query-optimizer techniques, and cloud-native architectures that integrate diverse data sources, enabling scalable, AI-ready analytics and modern data-lake capabilities.

Big DataData LakeHadoop

0 likes · 30 min read

Big Data Technology Trends and Cloud Data Warehouse Architecture Practices

Alibaba Cloud Developer

Oct 26, 2018 · Artificial Intelligence

How Alibaba’s Digital Vision Is Shaping AI‑Powered Smart Cities and New Retail

At the 2018 China Computer Conference, Alibaba CTO Zhang Jianfeng highlighted how digitalization underpins AI innovations across smart city infrastructure, new manufacturing, and retail, emphasizing massive data utilization, cognitive graph research, and immersive 3D store experiences.

AlibabaArtificial IntelligenceBig Data

0 likes · 4 min read

How Alibaba’s Digital Vision Is Shaping AI‑Powered Smart Cities and New Retail

Qunar Tech Salon

Oct 25, 2018 · Big Data

Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions

This article explains how Alibaba adopted Apache Flink as a unified, low‑latency, high‑throughput big‑data engine, detailing its stream‑first design, state management, checkpointing, massive production deployment, community contributions, and upcoming plans for a unified API, SQL layer, broader language support, and AI integration.

AlibabaApache FlinkBig Data

0 likes · 13 min read

Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions

DataFunTalk

Oct 24, 2018 · Artificial Intelligence

The Technical Growth Path of an Algorithm Engineer in the Big Data Era

This article summarizes Zeng Xianglin’s presentation on the stages of an algorithm engineer’s career—from academic Beta research and feature engineering through online deployment, model training, and deep‑learning applications—highlighting practical challenges and best practices in large‑scale advertising systems.

Big Dataalgorithm engineeringonline advertising

0 likes · 13 min read

The Technical Growth Path of an Algorithm Engineer in the Big Data Era

Alibaba Cloud Developer

Oct 22, 2018 · Big Data

Transforming Massive Trajectory Data into Flow Fields: A Scalable Visualization Approach

This article explains how traditional trajectory visualizations struggle with massive data and introduces a flow‑field generation algorithm that aggregates and visualizes large‑scale movement patterns efficiently, reducing visual clutter and rendering load while preserving key mobility insights.

Big DataDataVflow field

0 likes · 9 min read

Transforming Massive Trajectory Data into Flow Fields: A Scalable Visualization Approach

Programmer DD

Oct 21, 2018 · Big Data

How to Choose the Right Number of Kafka Partitions for Optimal Throughput

This article explains how to determine the optimal Kafka partition count by balancing throughput gains, key‑based ordering requirements, file descriptor limits, and availability impacts, offering practical guidelines such as testing hardware limits and using broker‑count multiples for scalable deployments.

Big DataPartitionsThroughput

0 likes · 8 min read

How to Choose the Right Number of Kafka Partitions for Optimal Throughput

Programmer DD

Oct 20, 2018 · Big Data

How to Increase Kafka Topic Partitions Safely and Why You Can't Decrease Them

This guide explains how to use the kafka‑topics.sh script to increase a Kafka topic's partition count, warns about the impact on keyed messages and ordering, and details why Kafka does not support decreasing partitions, offering alternative strategies for replication changes.

Big DataCLIPartitions

0 likes · 6 min read

How to Increase Kafka Topic Partitions Safely and Why You Can't Decrease Them

21CTO

Oct 19, 2018 · Big Data

How Meituan Scales Real‑Time Computing with Flink: Architecture, Challenges & Solutions

This article summarizes Meituan’s real‑time computing platform, detailing its layered architecture built on Kafka, Flink on YARN, state management, resource isolation, fault tolerance, monitoring, and the Petra metric aggregation system, while highlighting the challenges faced and the solutions implemented to achieve high‑throughput, low‑latency stream processing at massive scale.

Big DataFlinkReal-time Streaming

0 likes · 18 min read

How Meituan Scales Real‑Time Computing with Flink: Architecture, Challenges & Solutions

360 Quality & Efficiency

Oct 19, 2018 · Big Data

Information Fingerprint and Simhash Algorithm for Large-Scale Duplicate Detection

This article explains the concept of information fingerprints, compares traditional set‑equality methods, introduces the Simhash algorithm for high‑dimensional text similarity reduction, and demonstrates how partitioned 64‑bit fingerprints enable efficient duplicate detection on massive web data.

Big DataSimHashduplicate detection

0 likes · 6 min read

Information Fingerprint and Simhash Algorithm for Large-Scale Duplicate Detection

Java Backend Technology

Oct 18, 2018 · Fundamentals

How to Find the Top 1,000 Numbers in a Billion‑Element Array in Linear Time

In this interview case study, a candidate solves the challenge of extracting the largest 1,000 numbers from a billion‑element dataset using a quick‑select style partition algorithm, achieving O(n) time, and implements the solution in Java with concise code.

Big DataSelectionalgorithm

0 likes · 4 min read

How to Find the Top 1,000 Numbers in a Billion‑Element Array in Linear Time

Tencent Cloud Developer

Oct 17, 2018 · Industry Insights

Why Graph Databases Are Redefining Enterprise Data Strategy

The article provides a detailed market and application analysis of graph databases, highlighting rapid growth, key use cases in finance and social networks, Tencent's StarGraph solution, advantages over relational databases, current limitations, and future industry adoption trends.

Big DataCloud ComputingGraph Database

0 likes · 6 min read

Why Graph Databases Are Redefining Enterprise Data Strategy

Xianyu Technology

Oct 16, 2018 · Big Data

Millisecond-Level Counting for Billion-Scale Data via Offline Batch and Online Incremental Statistics

To achieve millisecond‑level counting on billion‑scale data, the Xianyu team replaced slow MySQL count queries with an offline batch that snapshots relational tables and computes totals, then uses KV‑store incremental statistics for online updates, delivering sub‑10 ms responses with near‑100 % success.

Big Datadatabaseincremental counting

0 likes · 7 min read

Millisecond-Level Counting for Billion-Scale Data via Offline Batch and Online Incremental Statistics

Java Backend Technology

Oct 13, 2018 · Big Data

Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods

An interviewee faces the challenge of determining whether a newly given integer exists within a set of 4 billion numbers, and the article explores efficient solutions—from naive disk‑I/O approaches to distributed processing and the memory‑saving bitmap technique—highlighting their performance trade‑offs and implementation details.

Big DataBitmapalgorithm

0 likes · 6 min read

Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods

Alibaba Cloud Developer

Oct 10, 2018 · Artificial Intelligence

How Alibaba’s Uni‑Marketing Boosted Brand Conversions with AI‑Driven Audience Selection

This article details Alibaba's Uni‑Marketing case study where a brand‑targeted audience selection algorithm, built on big‑data and AI techniques, improved the O→IPL deepening rate by 47% during the New‑Year Festival, outlining the technical pipeline, models, evaluation metrics, challenges, and future directions.

Big DataDigital Marketingbrand optimization

0 likes · 20 min read

Programmer DD

Oct 6, 2018 · Big Data

Elastic Search IPO: What It Means for Search and Big Data

Elastic announced its IPO on the NYSE under ticker ESTC, highlighting its origins, rapid growth to over 5000 customers worldwide, a $160 million FY2018 revenue, and its Elastic Stack suite that powers search and analytics across industries, while investors celebrated the stock surge.

Big DataElasticsearchIPO

0 likes · 6 min read

Elastic Search IPO: What It Means for Search and Big Data

JD Tech

Sep 29, 2018 · Artificial Intelligence

JD.com Prediction Technology: Architecture, Applications, and Future Directions

The article outlines JD.com's evolution of prediction technology from early book‑category sales forecasting to a comprehensive AI‑driven platform that supports sales, order, and GMV forecasts, describes its modular architecture and core algorithm choices, and discusses future enhancements for smarter supply‑chain collaboration.

Big DataPredictionforecasting

0 likes · 6 min read

JD.com Prediction Technology: Architecture, Applications, and Future Directions

Architects' Tech Alliance

Sep 26, 2018 · Operations

How Goldeneye Enables Adaptive, Intelligent Business Monitoring at Scale

Goldeneye, Alibaba Mom's monitoring platform, uses big‑data pipelines, dynamic threshold prediction, mean‑shift change‑point detection, and automated metric discovery to replace manual alarm settings, reduce false alerts, and provide intelligent, scalable business monitoring across hundreds of services.

Big DataOperationsbusiness monitoring

0 likes · 19 min read

How Goldeneye Enables Adaptive, Intelligent Business Monitoring at Scale

HomeTech

Sep 25, 2018 · Operations

Design and Implementation of an Integrated Log Collection, Analysis, and Monitoring System

This article describes how a rapidly growing technical team built a unified log system that consolidates program, web access, and slow logs, introduces host‑agent and process‑agent collection, leverages Kafka, Elasticsearch, and Storm for high‑throughput processing, and provides monitoring, alerting, and reporting features to improve reliability and operational efficiency.

Big DataElasticsearchLog Management

0 likes · 20 min read

Design and Implementation of an Integrated Log Collection, Analysis, and Monitoring System

Tencent Cloud Developer

Sep 20, 2018 · Industry Insights

How Big Data Drives Intelligent Outbound Calls and AI Customer Service

This article explains how a data‑driven platform combines big‑data preprocessing, behavior‑prediction models, and AI‑powered voice and text services to improve pre‑sale lead scoring, targeted SMS campaigns, and post‑sale customer support, using Tencent Cloud's TI One platform as a case study.

AI Customer ServiceBig DataIndustry Insights

0 likes · 17 min read

How Big Data Drives Intelligent Outbound Calls and AI Customer Service

Tencent Cloud Developer

Sep 20, 2018 · Artificial Intelligence

What Everyone Should Know About Machine Learning

Machine learning lets computers learn patterns from examples instead of explicit code, enabling tasks like image and fraud detection, predictive maintenance, and personalized services, now feasible thanks to big data, cloud compute, and open-source tools, and increasingly discussed by executives for strategic automation.

Big DataNeural NetworksPredictive Maintenance

0 likes · 11 min read

What Everyone Should Know About Machine Learning

Big Data and Microservices

Sep 17, 2018 · Big Data

5 Essential Data Mining Techniques Every Analyst Should Know

This article outlines five widely used data‑mining methods—association rules, classification/tagging, clustering, decision trees, and sequential pattern mining—explaining their principles, real‑world examples, and how they help organizations extract actionable insights from massive datasets.

Big DataDecision TreesSequential Pattern Mining

0 likes · 6 min read

5 Essential Data Mining Techniques Every Analyst Should Know

ITFLY8 Architecture Home

Sep 17, 2018 · Big Data

Efficient Sorting and De‑duplication of Massive Datasets: Key Algorithms

This article explores practical algorithms for sorting, de‑duplicating, and extracting top‑K records from massive data sets, covering bitmap techniques, external sorting, hash‑based counting, min‑heap selection, divide‑and‑conquer, Bloom filters, and distributed processing strategies.

Big DataBitmapHash

0 likes · 10 min read

Efficient Sorting and De‑duplication of Massive Datasets: Key Algorithms

Qunar Tech Salon

Sep 14, 2018 · Big Data

AIGOV Five‑Star Model for Data Asset Management: Framework, Capabilities, and Enterprise Practices

The article presents the AIGOV five‑star data asset management model, analyzes its five management domains and thirteen capability items, compares it with domestic and international frameworks, and illustrates its practical value through detailed enterprise case studies and references to maturity models.

Big DataData Asset ManagementMaturity Model

0 likes · 19 min read

AIGOV Five‑Star Model for Data Asset Management: Framework, Capabilities, and Enterprise Practices

Programmer DD

Sep 13, 2018 · Big Data

How Deleting a Kafka Topic Removes Consumer Offsets and Why It Matters

This article examines a real‑world Kafka scenario where a topic is created, messages are produced and consumed, the topic is deleted, and then recreated, revealing that deleting the topic also removes its consumer offset metadata from the __consumer_offsets internal topic, causing new consumers to rely on their auto.offset.reset configuration.

Big DataConsumer OffsetsGroupCoordinator

0 likes · 6 min read

How Deleting a Kafka Topic Removes Consumer Offsets and Why It Matters

JD Tech

Sep 7, 2018 · Information Security

Big Data and AI Security Insights from ISC 2018 Conference

The ISC 2018 conference highlighted the growing importance of big data and artificial intelligence security, presenting JD's research on anti‑scraping techniques, AI‑driven defenses against black‑market attacks, and a service‑oriented approach to protecting user data across enterprises.

AI securityBig DataInformation Security

0 likes · 5 min read

Big Data and AI Security Insights from ISC 2018 Conference

Tencent Cloud Developer

Sep 6, 2018 · Big Data

Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions

As mobile and IoT data surge, real-time stream computing—especially Flink’s low-latency, high-throughput, exactly-once engine—addresses challenges of latency, accuracy, and usability, and Tencent Cloud’s managed Flink service provides elastic, secure, integrated pipelines for applications ranging from online status monitoring to fraud detection and smart transportation.

Apache StormBig DataFlink

0 likes · 30 min read

Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions

Big Data and Microservices

Sep 5, 2018 · Industry Insights

How Big Data Transforms Our Thinking: From Sample Data to Intelligent Insight

The article explains how the rapid growth of big‑data technologies reshapes human cognition by shifting from sample‑based, precise analysis to whole‑population, fault‑tolerant, correlational, and ultimately intelligent thinking that mirrors the human brain.

Big Datacorrelationfault tolerance

0 likes · 7 min read

How Big Data Transforms Our Thinking: From Sample Data to Intelligent Insight

Big Data and Microservices

Sep 4, 2018 · Big Data

Exploring Five Big Data Architectures—from Traditional to Unified AI Designs

The article examines the evolution of big‑data processing by comparing five prevalent architectures—traditional Hadoop‑based stacks, streaming‑only designs, Kappa, Lambda, and the unified Unifield model—highlighting their strengths, weaknesses, and suitable scenarios while discussing the limitations of classic BI systems and the role of distributed storage, computation, and machine‑learning integration.

Big DataData ArchitectureHadoop

0 likes · 14 min read

Exploring Five Big Data Architectures—from Traditional to Unified AI Designs

Senior Brother's Insights

Aug 31, 2018 · Big Data

How to Test Membership in 4 Billion Integers with Bitmap and Distributed Techniques

An interview question about checking whether a new integer belongs to a set of 4 billion numbers leads to a discussion of distributed loading across eight machines, bitmap representation using 500 MB of memory, and interval‑based external sorting, illustrating practical big‑data algorithm design.

Big DataBitmapData Structures

0 likes · 7 min read

How to Test Membership in 4 Billion Integers with Bitmap and Distributed Techniques

Big Data and Microservices

Aug 28, 2018 · Big Data

Turning Idle Hadoop Clusters into Valuable Data-Driven Products and Processes

The article examines how enterprises can transform big data from idle Hadoop clusters into valuable assets by adopting data-driven processes and products, outlining the distinction between technology-driven and business-driven approaches, describing data and service product models, and highlighting process optimization across various business functions.

Big DataEnterprise Analyticsdata-driven processes

0 likes · 7 min read

Turning Idle Hadoop Clusters into Valuable Data-Driven Products and Processes

Big Data and Microservices

Aug 26, 2018 · Big Data

Why Data, Not Process, Is the New Core of Business: 10 Big‑Data Principles Explained

The article outlines ten core big‑data principles—shifting from process‑centric to data‑centric thinking, emphasizing data value, efficiency, relevance, full‑sample analysis, prediction, information‑finding, machine understanding, e‑commerce intelligence, and mass customization—illustrated with real‑world examples and their impact on modern industry.

Big DataIndustry Insightscorrelation

0 likes · 26 min read

Why Data, Not Process, Is the New Core of Business: 10 Big‑Data Principles Explained

DataFunTalk

Aug 21, 2018 · Artificial Intelligence

iQIYI Traffic Anti-Cheat: Techniques, System Architecture, and Future Directions

This article provides a comprehensive overview of iQIYI's traffic anti‑cheat mechanisms, covering definitions of fraudulent traffic, industry challenges, data cleaning relationships, system design, rule‑based and machine‑learning solutions, feature engineering, model evaluation, monitoring, service applications, and future prospects.

Big DataSystem ArchitectureTraffic analysis

0 likes · 11 min read

iQIYI Traffic Anti-Cheat: Techniques, System Architecture, and Future Directions

Big Data and Microservices

Aug 20, 2018 · Big Data

How Big Data Is Redefining Storage Architecture: Capacity, Latency, and Cost Challenges

The explosive growth of big‑data applications is forcing storage vendors to redesign architectures for petabyte‑scale capacity, real‑time latency, high IOPS, security compliance, and cost efficiency, while also addressing flexibility, data longevity, and the needs of both large and small users.

Big DataCost OptimizationIOPS

0 likes · 9 min read

How Big Data Is Redefining Storage Architecture: Capacity, Latency, and Cost Challenges

Meitu Technology

Aug 17, 2018 · Big Data

Meitu Distributed Bitmap System (Naix): Architecture, Implementation, and Performance Evaluation

Meitu’s Naix distributed bitmap system accelerates massive user‑data analytics by using a three‑layer architecture, sharded RoaringBitmap storage, and PalDB, delivering over 600× faster queries than Hive, supporting fast generation plugins, fault‑tolerant replication, and millisecond‑level RPC query responses while reducing storage by 67%.

Big DataBitmapNaix

0 likes · 16 min read

Meitu Distributed Bitmap System (Naix): Architecture, Implementation, and Performance Evaluation

Big Data and Microservices

Aug 16, 2018 · Big Data

Mastering Big Data Analysis: 5 Core Aspects and 4 Key Methods

This article outlines the five fundamental aspects of big data analysis—visualization, data‑mining algorithms, predictive analytics, semantic engines, and data quality management—and explains four primary analytical approaches: descriptive, diagnostic, predictive, and prescriptive analysis.

Big Datadata analysisdata mining

0 likes · 6 min read

Mastering Big Data Analysis: 5 Core Aspects and 4 Key Methods

Meitu Technology

Aug 14, 2018 · Big Data

Meitu Data Platform Architecture and Practices

Meitu’s data platform, serving dozens of apps with 500 million monthly active users and billions of daily events, combines the Arachnia log‑collection system, Kafka ingestion, multi‑layer storage (HDFS, MongoDB, HBase, Elasticsearch), offline Hive/MapReduce processing and real‑time Storm/Flink/Naix pipelines, supported by data‑workshop tools, staged evolution for scalability, and robust security and query‑validation mechanisms.

Big DataData PlatformETL

0 likes · 16 min read

Meitu Data Platform Architecture and Practices

Big Data and Microservices

Aug 13, 2018 · Big Data

8 Essential Principles for Effective Enterprise Big Data Implementation

The article outlines eight key principles that enterprises should follow to harness big data responsibly, covering goal definition, strategic partnership, source identification, continuous communication, agile iteration, technology evaluation, cloud alignment, and talent development with security considerations.

Big DataData GovernanceEnterprise

0 likes · 10 min read

8 Essential Principles for Effective Enterprise Big Data Implementation

Alibaba Cloud Developer

Aug 13, 2018 · Big Data

How Ele.me Evolved Its Real‑Time Engine: From Storm to Flink

This article examines Ele.me’s big‑data platform evolution, comparing Storm, Spark Streaming, Structured Streaming, and Flink, detailing their architectures, consistency semantics, performance trade‑offs, and why Flink became the preferred real‑time computation engine for the company.

Big DataFlinkSpark

0 likes · 15 min read

How Ele.me Evolved Its Real‑Time Engine: From Storm to Flink

Meitu Technology

Aug 11, 2018 · Big Data

Meitu Technology Salon: Evolution of the Big Data Platform, Distributed Bitmap (Naix), and Apache Kylin

At Meitu’s Technology Salon, senior big‑data experts detailed the end‑to‑end architecture and stability measures of Meitu’s large‑scale data platform, introduced the high‑performance distributed bitmap solution Naix, showcased the evolution of Meizu’s user‑insight system, and highlighted Apache Kylin’s OLAP capabilities and Superset integration for scalable, real‑time analytics.

Apache KylinBig DataData Analytics

0 likes · 9 min read

Meitu Technology Salon: Evolution of the Big Data Platform, Distributed Bitmap (Naix), and Apache Kylin

Big Data and Microservices

Aug 10, 2018 · Big Data

5 Ways Big Data Empowers Modern Enterprises

Big data has become a critical asset for companies, enabling them to understand users, precisely locate resources, enhance marketing and operations, deliver refined services, and anticipate crises, thereby turning raw information into strategic advantage across multiple business functions.

Big DataEnterprise AnalyticsResource Optimization

0 likes · 7 min read

5 Ways Big Data Empowers Modern Enterprises

iQIYI Technical Product Team

Aug 10, 2018 · Big Data

Data-Driven Entertainment: iQIYI’s Big Data Platform and AI Applications

iQIYI’s unified “Tongtian Tower” big‑data platform integrates analytics, AI and open APIs to turn viewer behavior and public sentiment into market insights, personalized recommendations, smart casting and churn‑prediction tools, embedding a data‑driven culture that fuels its rapid subscriber growth and revenue surge.

AIBig DataData Platform

0 likes · 12 min read

Data-Driven Entertainment: iQIYI’s Big Data Platform and AI Applications

Architects' Tech Alliance

Aug 8, 2018 · Big Data

High‑Performance Data Analytics (HPDA): Architecture, Market Trends, and Fujitsu Reference Model

The article provides a comprehensive overview of High‑Performance Data Analytics (HPDA), detailing its market drivers, technical classifications, integration of HPC with big‑data workloads, Fujitsu's reference architecture, hardware configurations, benchmark results, and the economic benefits of deploying HPDA on existing HPC infrastructures.

Big DataFujitsuHPC

0 likes · 14 min read

High‑Performance Data Analytics (HPDA): Architecture, Market Trends, and Fujitsu Reference Model

Architecture Digest

Aug 7, 2018 · Big Data

Apache Kafka Overview, Architecture, and Sample Producer/Consumer Code

This article provides a comprehensive overview of Apache Kafka, comparing it with ActiveMQ, explaining its distributed architecture, topics, partitions, consumption models, high‑availability mechanisms, exactly‑once semantics, and includes detailed Java producer and consumer code examples for practical implementation.

Big DataConsumerDistributed Messaging

0 likes · 22 min read

Apache Kafka Overview, Architecture, and Sample Producer/Consumer Code

dbaplus Community

Aug 6, 2018 · Big Data

Understanding RAID, HDFS, and MapReduce: From Storage to Distributed Computing

This article explains the storage challenges of big data, introduces RAID levels and their trade‑offs, describes the HDFS architecture with NameNode and DataNode replication, details the MapReduce programming model and execution flow, and shows how Hive translates SQL queries into MapReduce jobs.

Big DataHDFSMapReduce

0 likes · 23 min read

Understanding RAID, HDFS, and MapReduce: From Storage to Distributed Computing

Youzan Coder

Aug 3, 2018 · Big Data

Youzan Data Warehouse Metadata System: From Manual Tables to Metadata‑Driven Architecture

Youzan’s data‑warehouse metadata system evolved from manually maintained tables to an automated data dictionary and finally to a metadata‑driven architecture that automatically captures technical, business, and process metadata, visualizes lineage, tracks resource usage, manages synchronization rules and permissions, and now aims to improve novice usability with visual models and impact‑analysis tools.

Big DataLineageResource Monitoring

0 likes · 11 min read

Youzan Data Warehouse Metadata System: From Manual Tables to Metadata‑Driven Architecture

Meituan Technology Team

Aug 2, 2018 · Big Data

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

Meituan’s in‑store dining team demonstrates how R’s open‑source packages, powerful data manipulation, rich visualization libraries, and reproducible reporting can be engineered into scalable, parallelized workflows that turn secondary data processing into fast, interactive dashboards and analytics, proving R’s enterprise‑grade performance and adoption.

Big DataData visualizationR

0 likes · 18 min read

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

MaGe Linux Operations

Aug 2, 2018 · Big Data

Unlocking PUBG Victory: Data‑Driven Insights on Drop Zones, Final Circles, Weapons, and Kill Strategies

This article analyzes 18 million PUBG match records using Python to reveal optimal drop locations, high‑probability final‑circle spots, preferred weapons, and the relationship between kill distance, kill count, and winning chances, providing data‑driven strategies for players seeking more chicken dinners.

Big DataGame AnalyticsPUBG

0 likes · 13 min read

Unlocking PUBG Victory: Data‑Driven Insights on Drop Zones, Final Circles, Weapons, and Kill Strategies

Efficient Ops

Aug 1, 2018 · Operations

How Tencent Revolutionized Monitoring: From IDC Crises to AI‑Driven AIOps

This talk by Tencent’s monitoring R&D lead outlines a decade of evolution in large‑scale monitoring, covering real‑world incident cases, the three drivers behind architectural upgrades, the implementation of a three‑dimensional monitoring framework, and the application of AI‑powered AIOps for precise, rapid anomaly detection.

Big DataCloud ComputingOperations

0 likes · 18 min read

How Tencent Revolutionized Monitoring: From IDC Crises to AI‑Driven AIOps

Efficient Ops

Jul 30, 2018 · Big Data

Building a Simple Yet Scalable Big Data Platform for Live Streaming with Consul

This article shares how a fast‑growing short‑video company designed a lean big‑data architecture, introduced the ALPS foundation service, and leveraged Consul to automate CMDB, job distribution, service discovery, and monitoring, enabling efficient growth with minimal operational overhead.

ALPSBig DataConsul

0 likes · 18 min read

Building a Simple Yet Scalable Big Data Platform for Live Streaming with Consul

Big Data and Microservices

Jul 29, 2018 · Industry Insights

Top 5 Big Data & AI Trends Shaping 2018 and Beyond

According to recent Forrester and Forbes reports, 2018 will see AI overtaking big-data hype, driving five key trends—from heightened cybersecurity in healthcare to expanded IoT, plug-and-play AI solutions, the rise of chief digital officers, and smarter community policing—each reshaping how organizations leverage data.

AI trendsBig DataIndustry Analysis

0 likes · 8 min read

Top 5 Big Data & AI Trends Shaping 2018 and Beyond

Xianyu Technology

Jul 28, 2018 · Big Data

Real-Time Computation Architecture for Non-Timeline Feed Ranking

The paper presents a real‑time computation architecture on Alibaba Cloud Blink that scores and ranks non‑timeline feed items within a sliding 72‑hour window, updating rankings every few minutes, using Redis ZSET for fast retrieval, and discusses scaling optimizations such as interval tuning and external join‑and‑rank services.

Big DataReal‑Time Computingfeed ranking

0 likes · 6 min read

Real-Time Computation Architecture for Non-Timeline Feed Ranking

Architects Research Society

Jul 27, 2018 · Big Data

Overview of Apache Hive Features, Usage, and Management

Apache Hive is an open‑source data‑warehouse system built on Hadoop that enables users to read, write, and manage large distributed datasets using SQL‑like queries, offering features such as ETL support, various file‑format connectors, extensible UDFs, and integration with tools like Tez, Spark, and MapReduce.

Apache HiveBig DataETL

0 likes · 5 min read

Overview of Apache Hive Features, Usage, and Management

58 Tech

Jul 27, 2018 · Big Data

Sun Dial: 58.com’s General‑Purpose AB Testing Platform – Architecture, Features, and Real‑Time Data Processing

The Sun Dial platform is a universal A/B testing system built for 58.com that supports single‑layer and multi‑layer experiments, provides uniform traffic splitting, real‑time OLAP analytics with Druid, and offers a web interface for easy configuration, enabling data‑driven product optimization across multiple business lines.

A/B testingBig DataDruid

0 likes · 14 min read

Sun Dial: 58.com’s General‑Purpose AB Testing Platform – Architecture, Features, and Real‑Time Data Processing

Big Data and Microservices

Jul 26, 2018 · Industry Insights

How Big Data is Transforming the Financial Industry: Applications and Challenges

This article examines how big data technologies are reshaping banking, insurance, and securities by enabling customer profiling, precision marketing, risk management, and operational optimization, while also outlining the key challenges such as data quality, integration complexity, standards, and governance that the sector must overcome.

BankingBig DataData Analytics

0 likes · 19 min read

How Big Data is Transforming the Financial Industry: Applications and Challenges

Meituan Technology Team

Jul 26, 2018 · Backend Development

Evolution of Meituan Delivery System Architecture and Practices

Meituan Delivery’s architecture has progressed from a rapid MVP with coarse services to a scalable, fine‑grained platform comprising fulfillment, operation, and master‑data subsystems, employing reliability engineering, capacity planning, AI‑driven simulation, and location services to ensure high availability, efficiency, and future‑ready scalability.

AIBig DataMicroservices

0 likes · 16 min read

Evolution of Meituan Delivery System Architecture and Practices

StarRing Big Data Open Lab

Jul 25, 2018 · Big Data

Boosting Data Sharing Architecture: JDBC Limits, DistCp Speed & Kerberos Trust

This article examines the evolution of a data‑sharing exchange platform—moving from slow JDBC‑based transfers to storage‑level copying, introducing a two‑stage DistCp workflow, and securing cross‑cluster access with Kerberos‑based trust managed by the Guardian component.

ArchitectureBig DataDistcp

0 likes · 10 min read

Boosting Data Sharing Architecture: JDBC Limits, DistCp Speed & Kerberos Trust

Big Data and Microservices

Jul 24, 2018 · Big Data

Why Hadoop Still Leads Big Data Processing: Core Advantages Explained

This article introduces Hadoop’s open‑source big‑data framework, explains its core components HDFS and MapReduce, and outlines four key advantages—ease of deployment, robustness, scalability, and simplicity—while also covering HBase as the Hadoop‑based column‑oriented database.

Big DataHBaseHDFS

0 likes · 4 min read

Why Hadoop Still Leads Big Data Processing: Core Advantages Explained

JD Tech

Jul 24, 2018 · Databases

Understanding Graph Databases: Concepts, History, Use Cases, and Comparative Overview

This article explains what graph databases are, traces their evolution from early navigational models to modern distributed systems, highlights their core concepts and advantages over relational databases, showcases typical application scenarios, and provides a comparative overview of popular open‑source graph database engines to guide technology selection.

Big DataGraph DatabaseNoSQL

0 likes · 8 min read

Understanding Graph Databases: Concepts, History, Use Cases, and Comparative Overview

ITPUB

Jul 23, 2018 · Big Data

What China's Vaccine Procurement Data Reveals: A Province‑Level Analysis

This article documents the collection, cleaning, and statistical analysis of publicly released second‑category vaccine procurement data from 28 Chinese provinces, highlighting data sources, processing steps with pandas, top manufacturers, regional market shares, and the challenges encountered during the effort.

Big DataChinadata analysis

0 likes · 9 min read

What China's Vaccine Procurement Data Reveals: A Province‑Level Analysis

Tencent Cloud Developer

Jul 23, 2018 · Big Data

Analysis of Chinese Second-Class Vaccine Procurement Data

The study aggregates and cleans 2017‑2020 Chinese second‑class vaccine procurement data from 28 provinces into a 1,529‑record CSV, revealing a right‑skewed distribution where a handful of manufacturers—led by Beijing Kexing and Changchun Changsheng—account for the majority of entries, while noting gaps in several regions and encouraging further collaborative refinement.

Big DataChinese healthcaredata analysis

0 likes · 10 min read

Analysis of Chinese Second-Class Vaccine Procurement Data

Alibaba Cloud Developer

Jul 23, 2018 · Big Data

How Alibaba’s MaxCompute Became the Backbone of 99% Data Processing

This article reviews Alibaba's MaxCompute evolution from ODPS to a unified, multi‑cluster big‑data platform, detailing its architecture, development tools, large‑scale deployments, performance optimizations, typical workload scenarios, and why it is the preferred choice for enterprise data processing.

Alibaba CloudBig DataData Platform

0 likes · 22 min read

How Alibaba’s MaxCompute Became the Backbone of 99% Data Processing

Youzan Coder

Jul 20, 2018 · Big Data

How Youzan Built a Scalable Big Data Development Platform (DP)

This article details the design, architecture, and operational experience of Youzan's Data Platform (DP), covering its scheduling, data‑sync, service, and monitoring modules, the custom Airflow‑based task scheduler, current production metrics, supported task types, and future improvement plans.

AirflowBig DataData Platform

0 likes · 12 min read

How Youzan Built a Scalable Big Data Development Platform (DP)

StarRing Big Data Open Lab

Jul 19, 2018 · Big Data

How Multi‑Tenant Big Data Cloud Solves Data Silos and Low‑Speed Transfers

This article examines how a cloud‑native big data platform with multi‑tenant architecture addresses data silos, manual data distribution, and slow transfer speeds, using a real‑world banking case to illustrate functional requirements, design patterns, and optimization strategies.

ArchitectureBig DataTDC

0 likes · 11 min read

How Multi‑Tenant Big Data Cloud Solves Data Silos and Low‑Speed Transfers

Efficient Ops

Jul 18, 2018 · Operations

How Alibaba Scales Real‑Time Computing: Evolution of Its Operations Architecture

This article details Alibaba's real‑time computing platform, outlining its operational challenges, the unified automation platform Aquila, proactive fault‑elimination strategies, and ongoing moves toward intelligent, data‑driven management to support massive workloads during events like Double‑11.

AlibabaAquilaBig Data

0 likes · 15 min read

How Alibaba Scales Real‑Time Computing: Evolution of Its Operations Architecture

Didi Tech

Jul 17, 2018 · Artificial Intelligence

Didi Showcases AI‑Driven Intelligent Transportation Research at ACM SIGIR 2018

At ACM SIGIR 2018, Didi presented AI‑driven intelligent‑transportation research—including a ride‑sharing preference prediction paper, keynote insights on smart dispatch, maps and traffic, collaborations with over twenty cities and numerous universities, open data initiatives, and plans for new thematic research programs.

Artificial IntelligenceBig DataIndustry-Academia Collaboration

0 likes · 9 min read

Didi Showcases AI‑Driven Intelligent Transportation Research at ACM SIGIR 2018

360 Tech Engineering

Jul 13, 2018 · Big Data

Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice

The article describes the evolution of 360's Titan big‑data processing platform through three architectural stages, details its functional modules, explains the DITTO component framework, context and rule‑engine abstractions, and shares practical case studies and personal insights on building a flexible, self‑service data platform.

Big DataDITTOETL

0 likes · 12 min read

Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice

Efficient Ops

Jul 12, 2018 · Big Data

How Sogou Built a Scalable Big Data Platform: Lessons from a User Perspective

This article shares Sogou's journey in constructing a large‑scale big data platform, covering business overview, the evolution of its operations infrastructure, productization practices, security measures, and practical tips for medium‑size teams seeking to add value from data.

Big DataData PlatformOperations

0 likes · 22 min read

How Sogou Built a Scalable Big Data Platform: Lessons from a User Perspective

High Availability Architecture

Jul 12, 2018 · Information Security

Evolution of Zhihu’s Anti‑Cheat System “Wukong”: Architecture, Strategies, and Lessons Learned

This article chronicles the three‑generation evolution of Zhihu’s anti‑cheat platform Wukong, detailing its business context, spam taxonomy, multi‑layered control methods, architectural redesigns, strategy language improvements, graph‑based risk analysis, and the continuous integration of big‑data and machine‑learning techniques to combat content and behavior spam.

Big DataInformation SecurityRisk management

0 likes · 23 min read

Evolution of Zhihu’s Anti‑Cheat System “Wukong”: Architecture, Strategies, and Lessons Learned

Ctrip Technology

Jul 3, 2018 · Big Data

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

This article details Ctrip's experience with the Presto distributed SQL engine, outlining the initial performance and stability issues, the comprehensive enhancements made in security, resource control, compatibility, and monitoring, and the multi‑stage upgrade plan that guides its future evolution.

Big DataKerberosPerformance Optimization

0 likes · 11 min read

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

AntTech

Jul 3, 2018 · Backend Development

Evolution of Financial‑Grade Message Queues at Ant Financial

The article reviews the ten‑year evolution of Ant Financial's message queue, detailing its core reliability, consistency, availability and performance requirements, the architectural mechanisms built to meet them, the shift to pull‑mode and API‑mode designs, and the recent integration of compute capabilities to create a smart data transmission platform.

Big DataDistributed SystemsMessage Queue

0 likes · 13 min read

Evolution of Financial‑Grade Message Queues at Ant Financial

ITFLY8 Architecture Home

Jul 2, 2018 · Artificial Intelligence

How JD.com Built a Multi‑Screen Personalized Recommendation Engine

This article explains how JD.com evolved its recommendation system from simple product suggestions to a sophisticated, multi‑screen, multi‑type personalized engine using big‑data collection, real‑time behavior tracking, machine‑learning models, and a modular architecture that boosts conversion and user experience.

Big Datae‑commercemachine learning

0 likes · 14 min read

How JD.com Built a Multi‑Screen Personalized Recommendation Engine

Baidu Intelligent Testing

Jun 29, 2018 · Product Management

Baidu Product Evaluation Framework and Common Assessment Methods

This article outlines Baidu's comprehensive product evaluation framework, describing its multi‑layer assessment system, the combination of subjective and objective metrics, and a suite of common evaluation methods such as indicator analysis, AB testing, user feedback, behavior analysis, big‑data profiling, and competitor comparison.

AB testingBig DataMetrics

0 likes · 16 min read

Baidu Product Evaluation Framework and Common Assessment Methods

58 Tech

Jun 27, 2018 · Big Data

Overview of the 58 User Profile System Architecture and Data Processing

The article describes the design, data integration, ID mapping, tag generation, and application scenarios of the 58 user profiling platform, which aggregates billions of user IDs across multiple business lines to provide online and offline persona data for personalization, analytics, and AI modeling.

Big DataData ArchitectureData Integration

0 likes · 12 min read

DataFunTalk

Jun 24, 2018 · Big Data

OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring

This article summarizes OPPO's rapid growth of its big‑data platform, detailing the three‑layer architecture, the evolution from Flume‑Kafka to NiFi for data ingestion, the upgrade of the OFlow task scheduler, comprehensive monitoring of data, resources and task SLA, and the development of a self‑service analytics tool called InnerEye to ensure stability, efficiency, and security.

AirflowBig DataNiFi

0 likes · 10 min read

OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring

JD Retail Technology

Jun 20, 2018 · Artificial Intelligence

JD.com's 618 Shopping Festival: AI and Big Data Drive Seamless Retail Experience

JD.com's 618 shopping festival achieved 159.2 billion yuan in sales, leveraging AI and big data to enhance consumer experience and empower retail partners through intelligent supply chain solutions.

AIBig DataBlockchain

0 likes · 9 min read

JD.com's 618 Shopping Festival: AI and Big Data Drive Seamless Retail Experience

Qunar Tech Salon

Jun 20, 2018 · Big Data

How Spark Streaming Submits Tasks: Internal Mechanics and Code Walkthrough

This article explains the internal workflow of Spark Streaming task submission, detailing how StreamingContext, DStream, receivers, and output operators are transformed into RDD jobs, and includes annotated Scala code examples that illustrate each step of the process.

Big DataDstreamReal-time Processing

0 likes · 13 min read

How Spark Streaming Submits Tasks: Internal Mechanics and Code Walkthrough

Architecture Digest

Jun 18, 2018 · Operations

Design and Optimization of Large‑Scale Log Systems

This article examines the challenges of handling massive log data in high‑traffic e‑commerce platforms and presents a comprehensive architecture, optimization strategies, and practical implementations—including Rsyslog, Kafka, Fluentd, and the ELK stack—to improve scalability, performance, and reliability of log management systems.

Big DataELKFluentd

0 likes · 17 min read

Design and Optimization of Large‑Scale Log Systems

Didi Tech

Jun 16, 2018 · Artificial Intelligence

AI and Big Data in Didi’s Mapping Services – Insights from WGDC 2018

At WGDC 2018, Didi’s mapping division revealed how its AI‑driven platform leverages massive real‑time travel data, machine‑learning and deep‑learning models—including a new ETA estimator, demand‑supply forecasting, and reinforcement‑learning order allocation—to deliver ultra‑accurate pick‑up points, route planning, and destination predictions, while opening de‑identified data and research topics to academia.

AIBig DataETA

0 likes · 6 min read

AI and Big Data in Didi’s Mapping Services – Insights from WGDC 2018

dbaplus Community

Jun 14, 2018 · Big Data

Designing Scalable Hadoop‑Based Data Analytics Platforms: Architecture & Best Practices

This article explains how enterprises can build a scalable data analytics platform on Hadoop by outlining the multi‑layer architecture, storage options, data synchronization methods, and ETL/offline computation techniques, while highlighting practical component choices such as Hive, HBase, Spark, and Oozie.

Big DataData ArchitectureData Lake

0 likes · 10 min read

Designing Scalable Hadoop‑Based Data Analytics Platforms: Architecture & Best Practices

Tencent Cloud Developer

Jun 11, 2018 · Cloud Computing

Tencent Cloud's Government Cloud Strategy and Digital Guangdong Practice

Tencent Cloud’s government‑cloud strategy, showcased by Guangdong’s “粤省事” platform, leverages WeChat as a single access point and a partner‑driven backend of AI, big‑data and IoT services to digitize certificates, streamline workflows for citizens, businesses and officials, and address low public‑service satisfaction by redesigning processes rather than merely automating them.

AIBig DataDigital Transformation

0 likes · 12 min read

Tencent Cloud's Government Cloud Strategy and Digital Guangdong Practice

Hujiang Technology

Jun 11, 2018 · Operations

Recap of the GIAC Shenzhen Conference: Architecture, Performance, and Scaling Practices from Leading Tech Companies

The article summarizes the GIAC Shenzhen conference, highlighting front‑end, mobile, and backend optimization techniques, large‑scale architecture designs, cloud‑native solutions, big‑data testing strategies, and quality assurance practices shared by top Chinese internet firms.

ArchitectureBig DataMobile

0 likes · 10 min read

Recap of the GIAC Shenzhen Conference: Architecture, Performance, and Scaling Practices from Leading Tech Companies

Efficient Ops

Jun 6, 2018 · Big Data

How Tencent’s Multi‑Dimensional Monitoring Turns Big Data Into Real‑Time Business Insights

This article explains how Tencent’s ZhiYun multi‑dimensional monitoring system evolves from the Mobile Monitor platform, outlines its design principles, data‑factory capabilities, storage choices, and intelligent features, and demonstrates how it enables real‑time, multi‑dimensional analysis and alerting for large‑scale business operations.

Big DataDruidStorm

0 likes · 11 min read

How Tencent’s Multi‑Dimensional Monitoring Turns Big Data Into Real‑Time Business Insights

ITPUB

Jun 4, 2018 · Big Data

Is Hadoop Really Declining? Expert Insights Show Why the Ecosystem Stays Strong

Despite Gartner's 2017 claim that Hadoop is nearing the end of its production maturity, a series of interviews with Chinese big‑data experts reveal that Hadoop's ecosystem remains robust, with core components like HDFS, YARN, Spark, and HBase continuing to dominate the market.

Big DataEcosystemGartner

0 likes · 9 min read

Is Hadoop Really Declining? Expert Insights Show Why the Ecosystem Stays Strong

ITPUB

Jun 3, 2018 · Big Data

Spark vs Hadoop: Which Distributed System Fits Your Data Needs?

An in‑depth comparison of Hadoop and Spark examines their architectures, performance, cost, security, and machine‑learning capabilities, helping readers decide which open‑source distributed processing platform best matches their batch, streaming, and analytical workloads.

Big DataCostHadoop

0 likes · 13 min read

Spark vs Hadoop: Which Distributed System Fits Your Data Needs?

ITPUB

Jun 2, 2018 · Big Data

Mastering Spark: Core Concepts, Architecture, Streaming & Performance Tuning

This comprehensive guide explains Spark's ecosystem, execution principles, key features, deployment architectures, core concepts like RDD, Transformations, Actions, Jobs, Stages, Shuffle and Cache, as well as Spark Streaming mechanics and practical resource‑tuning tips for optimal big‑data processing.

Big DataClusterPerformance Tuning

0 likes · 15 min read

Mastering Spark: Core Concepts, Architecture, Streaming & Performance Tuning

Tencent Cloud Developer

Jun 1, 2018 · Backend Development

Building Tencent Xinge: Architecture and Practices for Massive Mobile Push Service

The talk details Tencent Xinge’s architecture and cloud‑native practices that enable hundred‑billion‑level mobile push, combining terminal integration, real‑time backend filtering, distributed bitmap selection, precise‑push AI models, and DevOps pipelines to deliver fast, scalable, data‑driven notifications with effect tracking.

Backend ArchitectureBig DataDistributed Systems

0 likes · 18 min read

Building Tencent Xinge: Architecture and Practices for Massive Mobile Push Service

ITPUB

May 31, 2018 · Big Data

Mastering Spark on DataMagic: Fast‑Track Your Big Data Skills

This article explains Spark's role in the DataMagic platform, outlines four practical steps to quickly master Spark, details key configuration and parallelism settings, shows how to modify Spark code, and provides operational tips for cluster management and job troubleshooting.

Big DataCluster ManagementDataMagic

0 likes · 10 min read

Mastering Spark on DataMagic: Fast‑Track Your Big Data Skills

dbaplus Community

May 30, 2018 · Big Data

Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Strategies

This article explains Spark's executor memory architecture, covering on‑heap and off‑heap allocation, static versus unified memory managers, storage and execution memory handling, RDD persistence levels, eviction policies, and shuffle memory usage, providing practical formulas and configuration tips for optimal performance.

Big DataExecutorMemory Management

0 likes · 23 min read

Understanding Spark Executor Memory Management: On‑Heap, Off‑Heap, and Unified Strategies

DataFunTalk

May 29, 2018 · Big Data

Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)

This article details the evolution of Lianjia's massive‑data platform from its initial 0 version through 1.0 and 2.0, describing architectural challenges, a three‑layer redesign, data‑engine selection (ROLAP, MOLAP, Kylin), transparent compression techniques, and practical lessons for large‑scale data systems.

ArchitectureBig DataHadoop

0 likes · 14 min read

Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)

Architecture Digest

May 28, 2018 · Big Data

Building a Real-Time Stream Processing Platform with Hadoop Ecosystem (Kafka, Spark Streaming, HBase)

This guide details how to construct a real-time data processing platform on CentOS 7 using the Hadoop ecosystem—installing and configuring Zookeeper, Maven, Hadoop, Kafka, HBase, Spark, and Flume—followed by a Spark Streaming job that consumes Kafka messages and writes them into HBase.

Big DataFlumeHBase

0 likes · 14 min read

Building a Real-Time Stream Processing Platform with Hadoop Ecosystem (Kafka, Spark Streaming, HBase)

Architecture Digest

May 27, 2018 · Big Data

Installing Elasticsearch and Performing Data Aggregation Queries

This article walks through installing Elasticsearch 5.6.9, configuring system limits, creating indices, inserting and deleting documents, executing complex aggregation queries, and integrating Elasticsearch with Java using the TransportClient, providing a practical guide for building analytics on large‑scale data.

AnalyticsBig DataElasticsearch

0 likes · 12 min read

Installing Elasticsearch and Performing Data Aggregation Queries

dbaplus Community

May 23, 2018 · Big Data

Understanding MapReduce: A Simple Analogy to Master Big Data Distributed Computing

This article uses a human‑computer analogy and a playing‑card counting example to explain the fundamentals of distributed computing, why single machines cannot handle massive data, and how the MapReduce model’s four steps—split, transform, shuffle, and merge—solve big‑data problems.

Big DataMapReducedata-processing

0 likes · 15 min read

Understanding MapReduce: A Simple Analogy to Master Big Data Distributed Computing