Tagged articles

276 articles

Page 3 of 3

Nov 10, 2020 · Artificial Intelligence

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, detailing its core algorithms, performance advantages over Spark ML, version evolution, Maven and PyAlink installation steps, data‑source integrations, FM algorithm support, and unified file‑system operations for both batch and streaming workloads.

AlinkFlinkPyAlink

0 likes · 11 min read

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

Youku Technology

Oct 16, 2020 · Mobile Development

How Youku Achieved Seamless Multi‑Device UI with a Responsive Android SDK

This article explains Youku's Android responsive solution, covering the responsive SDK, loading flow, architecture, data reprocessing, page and container responsiveness, and control size adaptation, providing practical guidelines for building a single app that adapts to diverse device sizes and form factors.

AndroidFoldable ScreensResponsive Design

0 likes · 13 min read

How Youku Achieved Seamless Multi‑Device UI with a Responsive Android SDK

Amap Tech

Sep 24, 2020 · Artificial Intelligence

How High‑Precision Maps Power Autonomous Driving: Inside Amap’s AI and Cloud Strategies

The article details Amap’s (Gaode) technical approach to building and deploying high‑precision maps for autonomous driving, covering accuracy requirements, data collection, point‑cloud alignment, AI‑driven perception and map‑update pipelines, and the challenges of scale, cost, and freshness.

AI Algorithmsautonomous drivingdata-processing

0 likes · 10 min read

How High‑Precision Maps Power Autonomous Driving: Inside Amap’s AI and Cloud Strategies

Python Crawling & Data Mining

Sep 22, 2020 · Fundamentals

Master Excel Automation with Python: A Complete openpyxl Guide

This tutorial walks you through installing openpyxl, understanding Excel's workbook‑sheet‑cell hierarchy, and provides step‑by‑step code examples for reading, writing, styling, and manipulating Excel files using Python, serving as a handy reference for developers.

ExcelPythonautomation

0 likes · 7 min read

Master Excel Automation with Python: A Complete openpyxl Guide

DataFunTalk

Sep 20, 2020 · Artificial Intelligence

Building a Production‑Ready Recommendation System with Python, LLR, and ElasticSearch

This tutorial explains how to construct a recommendation system by loading transaction data, creating sparse user‑item and item‑item matrices, applying Log‑Likelihood Ratio for item similarity, and indexing the results into ElasticSearch for real‑time serving, using Python and open‑source big‑data tools.

LLRPythondata-processing

0 likes · 16 min read

Building a Production‑Ready Recommendation System with Python, LLR, and ElasticSearch

Java Captain

Aug 24, 2020 · Backend Development

Java 8 Stream API: Grouping, Mapping, Filtering, Summing and Other Collection Operations

This article demonstrates how to leverage Java 8 Stream API to perform common collection operations such as defining a data class, creating test data, grouping by fields, converting lists to maps, filtering, summing numeric fields, finding max/min values, removing duplicates, and explains the Collectors utility methods.

Collectiondata-processingjava8

0 likes · 9 min read

Java 8 Stream API: Grouping, Mapping, Filtering, Summing and Other Collection Operations

Big Data Technology & Architecture

Aug 21, 2020 · Big Data

Spark + Kudu Advertising Business Project: Data Statistics and Processing Guide

This article demonstrates how to implement an advertising business data statistics pipeline using Spark and Kudu, detailing metric requirements, Scala processing code, complex SQL aggregations, schema design, and data sinking for verification.

Big DataKuduScala

0 likes · 7 min read

Spark + Kudu Advertising Business Project: Data Statistics and Processing Guide

Big Data Technology & Architecture

Aug 21, 2020 · Big Data

Spark + Kudu Advertising Business Project: Step-by-Step Implementation

This article walks through the complete implementation of an advertising statistics pipeline using Spark and Kudu, covering requirement analysis, Scala code development, SQL queries, schema definition, and data sinking, with full code snippets and execution results.

Big DataKuduScala

0 likes · 7 min read

Spark + Kudu Advertising Business Project: Step-by-Step Implementation

Big Data Technology & Architecture

Aug 21, 2020 · Big Data

Spark + Kudu Advertising Project: Refactoring, Scala Traits, ETL Processor, and Project Entry

This article walks through a Spark and Kudu advertising project, explaining the refactoring approach, Scala trait usage, implementation of ETL and province‑city statistics processors, and shows the complete Spark application entry point with full code examples.

Big DataETLKudu

0 likes · 7 min read

Spark + Kudu Advertising Project: Refactoring, Scala Traits, ETL Processor, and Project Entry

FunTester

Aug 21, 2020 · Backend Development

Practical Guide to JsonPath: Filtering JSON Data with Operators and Functions

This article explains how to use JsonPath in Java to filter JSON data, covering operators, functions, and practical examples such as checking values in arrays, subsets, size constraints, and empty checks, with complete code snippets for each case.

BackendJSONdata-processing

0 likes · 7 min read

Practical Guide to JsonPath: Filtering JSON Data with Operators and Functions

Python Crawling & Data Mining

Aug 6, 2020 · Fundamentals

Boost Your Excel Workflow with Grid Studio: Python-Powered Web Spreadsheet

Grid Studio is a web‑based spreadsheet that integrates Python, allowing users to read, write, and visualize Excel data with just a few lines of code, offering seamless installation, custom functions, and powerful data‑science tools like Plotly and Matplotlib for efficient data processing.

Grid StudioPythonSpreadsheet

0 likes · 5 min read

Boost Your Excel Workflow with Grid Studio: Python-Powered Web Spreadsheet

Alibaba Cloud Developer

Jul 22, 2020 · Big Data

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

This article surveys the rapidly evolving big data landscape by reviewing a wide range of Apache projects—including Hadoop, Spark, Flink, HBase, Kudu, Impala, Kafka, and others—detailing their core components, architectures, strengths, and typical use‑cases for building distributed data platforms.

ApacheBig DataDistributed Systems

0 likes · 20 min read

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

iQIYI Technical Product Team

Jul 3, 2020 · Backend Development

Restructuring of Voting Service for 'You Are My Youth 2' to Enhance Scalability and Maintainability

The voting service for 'You Are My Youth 2' was re‑architected using Docker‑based QAE and the Skywalker microservices platform, adding containerized one‑click scaling, cross‑data‑center MySQL/Couchbase/HBase high availability, and Hive/Impala real‑time processing, which doubled performance, cut preparation from 30 days to 12 hours, and incorporated third‑party audit verification.

MicroservicesScalabilityVoting Service

0 likes · 12 min read

Restructuring of Voting Service for 'You Are My Youth 2' to Enhance Scalability and Maintainability

Python Crawling & Data Mining

Jun 30, 2020 · Fundamentals

Master Excel‑Pandas Integration: From Data Import to Visualization in Python

This tutorial demonstrates how to combine Excel’s interactive features with Python’s Pandas library to perform comprehensive data operations—including reading, generating, filtering, sorting, handling missing values, deduplication, merging, grouping, calculation, statistics, visualization, sampling, pivot tables, and VLOOKUP—showing when each tool excels.

ExcelPythondata-processing

0 likes · 13 min read

Master Excel‑Pandas Integration: From Data Import to Visualization in Python

Python Programming Learning Circle

Jun 28, 2020 · Backend Development

Scraping iQiyi Bullet Comments and Generating a Word Cloud with Python

This article demonstrates how to scrape bullet comments from iQiyi for the first episode of a popular mystery series, decode the binary files, extract the text, and use Python's jieba and wordcloud libraries to clean the data and generate a visual word cloud of audience sentiments.

PythonWeb Scrapingdata-processing

0 likes · 7 min read

Scraping iQiyi Bullet Comments and Generating a Word Cloud with Python

MaGe Linux Operations

Jun 9, 2020 · Backend Development

How to Search Student Records in Excel Using Python xlrd

This tutorial demonstrates how to use Python's xlrd library to read an Excel file containing student records and retrieve a specific student's information by name or ID, covering installation, code walkthrough, and sample output.

Exceldata-processingsearch

0 likes · 4 min read

How to Search Student Records in Excel Using Python xlrd

Architect

May 29, 2020 · Artificial Intelligence

Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines

This article explains how to combine the Flink data‑processing engine with TensorFlow to create a unified, end‑to‑end machine‑learning workflow, covering background, challenges, the Flink‑AI‑extended architecture, ML framework and operator abstractions, and both batch and streaming training and prediction modes.

AI integrationDistributed TrainingFlink

0 likes · 9 min read

Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines

Python Programming Learning Circle

May 25, 2020 · Fundamentals

Using Pandas to Automate Excel Data Processing and Visualization

This article demonstrates how Python's pandas library can replace manual Excel operations by reading, calculating, sorting, filtering, and visualizing data, offering efficient solutions for large datasets and showcasing code examples for automatic column filling, price adjustments, and student score analysis.

PythonSortingdata-processing

0 likes · 6 min read

Using Pandas to Automate Excel Data Processing and Visualization

Big Data Technology & Architecture

May 10, 2020 · Big Data

Apache Beam Overview: Architecture, Programming Model, PCollection, Pipeline and Transform

This article provides a comprehensive introduction to Apache Beam, covering its unified batch‑and‑stream processing architecture, programming model, workflow patterns, Lambda and Kappa architectures, the characteristics of PCollection, pipeline construction, core transforms, I/O handling, and includes practical code examples.

Apache BeamBig DataLambda architecture

0 likes · 14 min read

Apache Beam Overview: Architecture, Programming Model, PCollection, Pipeline and Transform

Liangxu Linux

Apr 25, 2020 · Operations

Why Dumping Logs into a DB Fails and How Awk Solves the Problem

The article explains why loading all log data into a database is impractical, outlines three drawbacks—volatile requests, data bloat, and cost—and introduces the lightweight awk tool with concrete command examples to filter and analyze network logs efficiently without a database.

Sysadminawkdata-processing

0 likes · 6 min read

Why Dumping Logs into a DB Fails and How Awk Solves the Problem

Python Crawling & Data Mining

Apr 3, 2020 · Fundamentals

Automate Excel Data Cleaning with Python pandas: Step-by-Step Guide

This article demonstrates how to use Python's pandas library to read CSV and XLS files, filter and merge data based on group assignments, compute derived columns, and export the results to a new Excel workbook, providing a complete automation workflow for Excel data processing.

CSVExcel AutomationPython

0 likes · 6 min read

Automate Excel Data Cleaning with Python pandas: Step-by-Step Guide

MaGe Linux Operations

Apr 1, 2020 · Backend Development

15 Must‑Know Python Open‑Source Frameworks for Modern Development

This article compiles the 15 most popular open‑source Python frameworks—from full‑stack web solutions like Django and Flask to specialized tools for event I/O, OLAP, distributed computing, and continuous integration—providing concise descriptions to help developers choose the right library for their projects.

data-processingframeworksweb-development

0 likes · 6 min read

15 Must‑Know Python Open‑Source Frameworks for Modern Development

DataFunTalk

Mar 6, 2020 · Artificial Intelligence

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

This article reviews recent progress in Apache Flink's AI ecosystem, explaining how Flink unifies batch and stream processing for machine‑learning pipelines, introduces the Flink ML Pipeline and Alink library, describes the AI Flow framework for end‑to‑end ML workflows, and presents a novel mini‑batch streaming iteration mechanism to support both offline and online learning scenarios.

AI FlowApache FlinkMini-batch Iteration

0 likes · 13 min read

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

Qunar Tech Salon

Feb 20, 2020 · Operations

Design and Implementation of Business‑Driven Monitoring Systems at JD Cloud

This article explains why monitoring is essential for operations, outlines the four‑layer monitoring standard (infrastructure, liveliness, performance, business), breaks down functional modules and data flows, and showcases JD Cloud's practical design, alarm‑convergence project, and future AI‑driven observability directions.

JD CloudOperationsalert convergence

0 likes · 12 min read

Design and Implementation of Business‑Driven Monitoring Systems at JD Cloud

Mafengwo Technology

Dec 26, 2019 · Backend Development

How We Built a High‑Performance Ad Monitoring Service with OpenResty

This article details the design and implementation of ADMonitor, a high‑availability, low‑latency advertising monitoring platform built on OpenResty, covering its architecture, data collection, processing, archiving, and performance outcomes.

Ad MonitoringLuaOpenResty

0 likes · 11 min read

How We Built a High‑Performance Ad Monitoring Service with OpenResty

Architects Research Society

Dec 25, 2019 · Cloud Native

Common Use Cases of the OpenWhisk Serverless Platform

The article outlines how OpenWhisk’s serverless execution model supports diverse use cases—including microservices, web and mobile back‑ends, IoT pipelines, API services, data processing, and cognitive applications—highlighting its modularity, language flexibility, automatic scaling, and integration with cloud services.

APICloud NativeIoT

0 likes · 8 min read

Common Use Cases of the OpenWhisk Serverless Platform

Programmer DD

Dec 18, 2019 · Backend Development

Master Java 8 Streams: From Basics to Advanced Operations

This article introduces Java 8's Stream API, explains why functional streams improve code readability and performance, and provides detailed examples of common operations such as filter, map, flatMap, reduce, collect, Optional handling, parallel processing, and debugging techniques for efficient data processing.

Java 8LambdaStream API

0 likes · 16 min read

Master Java 8 Streams: From Basics to Advanced Operations

ITPUB

Dec 9, 2019 · Fundamentals

Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation

This tutorial walks through loading order data into pandas and SQL, then demonstrates how to retrieve current dates, extract date components, convert between readable dates and Unix timestamps, transform between 10‑digit and 8‑digit date formats, and perform date arithmetic using pandas, MySQL, and Hive.

data-processingdate handlingdatetime

0 likes · 16 min read

Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation

DataFunTalk

Nov 21, 2019 · Big Data

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

The article details the technical evolution of 58.com’s real-time computing platform—from Storm and Spark Streaming to a Flink‑based one‑stop solution called Wstream—covering use cases, architecture, stability measures, migration from Storm, operational diagnostics, and future development plans.

Big DataFlinkReal-time Streaming

0 likes · 11 min read

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

Big Data Technology & Architecture

Sep 15, 2019 · Big Data

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

This article presents a comprehensive collection of Flink interview questions covering fundamental concepts, advanced topics, and source‑code details to help candidates prepare effectively for Flink‑related technical interviews.

Apache FlinkBig DataFlink

0 likes · 6 min read

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

DataFunTalk

Sep 5, 2019 · Big Data

Apache Beam Architecture Principles and Practical Application

This article introduces Apache Beam as a unified programming model for batch and streaming data processing, explains its architecture, core components, advantages, extensibility, and demonstrates practical usage with KafkaIO, BeamSQL, and AIoT scenarios across multiple runners.

Apache BeamKafkaStreaming

0 likes · 16 min read

Apache Beam Architecture Principles and Practical Application

Alibaba Cloud Developer

Jul 1, 2019 · Big Data

Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture

This article examines the technical challenges of large‑scale data processing, compares the classic Lambda and Kappa architectures, introduces the unified stream‑batch Lambda+ design built on Tablestore and Blink, and outlines suitable scenarios and practical solutions for modern big‑data systems.

Big DataKappa architectureLambda architecture

0 likes · 16 min read

Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture

Xianyu Technology

Jun 20, 2019 · Big Data

Design of a High-Performance Real-Time Data Processing System for Service Diagnosis

The paper presents a high‑performance real‑time data processing pipeline that collects, transports, preprocesses, and computes service logs and metrics using Alibaba Logtail, LogHub, and an enhanced Flink (Blink) engine, persisting root‑cause graphs in Lindorm, achieving sub‑3‑second latency for tens of millions of events per second and cutting diagnosis time to about five seconds.

FlinkReal-time Streamingarchitecture

0 likes · 10 min read

Design of a High-Performance Real-Time Data Processing System for Service Diagnosis

MaGe Linux Operations

Jun 12, 2019 · Frontend Development

Build a Desktop Weather App with Python and PyQt5 – Full Step‑by‑Step Guide

This tutorial walks through creating a desktop weather application with Python 3, PyQt5, and the requests library, covering environment setup, city code data preparation, UI design with Qt Designer, API querying, JSON parsing, widget handling, shortcut keys, and final packaging with PyInstaller.

GUIPyQt5Python

0 likes · 6 min read

Build a Desktop Weather App with Python and PyQt5 – Full Step‑by‑Step Guide

Big Data Technology Architecture

Apr 23, 2019 · Big Data

Understanding Spark Shuffle: Stages, Evolution, and Source Code Structure

This article explains the concept of Spark Shuffle, details its two-phase write and read processes, describes the evolution from Hash‑based to Sort‑based and Tungsten‑based shuffles across Spark versions, and outlines the relevant source‑code components in Spark 2.1.

Shuffle EvolutionSparkSpark Internals

0 likes · 10 min read

Understanding Spark Shuffle: Stages, Evolution, and Source Code Structure

Liangxu Linux

Apr 14, 2019 · Backend Development

Master JSON Formatting and Extraction on Linux with jq

This guide explains what jq is, how to install it on various Linux distributions, and provides step‑by‑step examples for pretty‑printing JSON, extracting specific fields, handling arrays, and using built‑in functions like keys and has, all with clear command‑line snippets.

command-linedata-processingjq

0 likes · 6 min read

Master JSON Formatting and Extraction on Linux with jq

MaGe Linux Operations

Oct 19, 2018 · Artificial Intelligence

Why Numpy’s Array vs Matrix Can Trip Up Your Machine Learning Projects

The article examines common pitfalls when using NumPy arrays and matrices for data manipulation in machine learning, highlighting chaotic data structures, inefficient filtering, confusing arithmetic syntax, and unintuitive code patterns compared to MATLAB/Octave, and concludes with a critique of Python’s ergonomics.

NumPyPythondata-processing

0 likes · 7 min read

Why Numpy’s Array vs Matrix Can Trip Up Your Machine Learning Projects

360 Quality & Efficiency

Oct 15, 2018 · Big Data

An Introduction to Big Data Concepts, Hadoop Ecosystem, and Common Frameworks

This article provides a comprehensive overview of big data fundamentals, including the 4V characteristics, the Hadoop 2.0 layered architecture, a comparison between Hadoop and Spark, classification of common big‑data tools, and the typical offline and real‑time data processing workflows.

ETLHadoopSpark

0 likes · 6 min read

An Introduction to Big Data Concepts, Hadoop Ecosystem, and Common Frameworks

Alibaba Cloud Developer

Sep 20, 2018 · Artificial Intelligence

Inside Alibaba DAMO Academy’s AI Vision: New Labs, Chips, and Quantum Ambitions

At the 2018 Hangzhou Yunqi Conference, Alibaba's CTO Zhang Jianfeng outlined DAMO Academy’s expansive AI strategy, unveiling new research labs, a semiconductor venture, quantum computing initiatives, and a youth science award aimed at accelerating data, algorithm, and computing breakthroughs.

AI researchAlibabaDAMO Academy

0 likes · 8 min read

Inside Alibaba DAMO Academy’s AI Vision: New Labs, Chips, and Quantum Ambitions

Alibaba Cloud Developer

Sep 19, 2018 · Artificial Intelligence

Inside Alibaba Damo Academy’s 2018 Vision: AI, Chips, Quantum & Global Labs

At the 2018 Hangzhou Cloud Expo, Alibaba’s CTO and Damo Academy director Zhang Jianfeng unveiled the institute’s global expansion, new semiconductor venture, AI research pillars, upcoming NPU chip, quantum computing initiatives, and the youth-focused Damo Academy Green Orange Award, highlighting a comprehensive strategy for data, algorithms, and computing power.

Quantum ComputingResearch Labsdata-processing

0 likes · 8 min read

Inside Alibaba Damo Academy’s 2018 Vision: AI, Chips, Quantum & Global Labs

360 Tech Engineering

Aug 7, 2018 · Big Data

Evolution and Practice of 360 Big Data Center Platform

The article presents a comprehensive overview of 360's Big Data Center evolution, covering business background, platform‑as‑a‑service architecture, data asset management, user‑profile unification, platform milestones, technical architecture, performance optimizations, online query capabilities, future plans, and a Q&A session.

360Data GovernanceData Platform

0 likes · 22 min read

Evolution and Practice of 360 Big Data Center Platform

Meituan Technology Team

Aug 2, 2018 · Big Data

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

Meituan’s in‑store dining team demonstrates how R’s open‑source packages, powerful data manipulation, rich visualization libraries, and reproducible reporting can be engineered into scalable, parallelized workflows that turn secondary data processing into fast, interactive dashboards and analytics, proving R’s enterprise‑grade performance and adoption.

Big DataData visualizationR

0 likes · 18 min read

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

ITPUB

Jun 10, 2018 · Big Data

13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem

This article introduces Hadoop’s origins and core challenges, then presents thirteen essential open‑source tools spanning resource scheduling, real‑time query engines, and additional processing frameworks, detailing each project's purpose, key features, and repository locations to help practitioners choose the right component for big‑data workloads.

HadoopImpalaSpark

0 likes · 12 min read

13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem

dbaplus Community

May 23, 2018 · Big Data

Understanding MapReduce: A Simple Analogy to Master Big Data Distributed Computing

This article uses a human‑computer analogy and a playing‑card counting example to explain the fundamentals of distributed computing, why single machines cannot handle massive data, and how the MapReduce model’s four steps—split, transform, shuffle, and merge—solve big‑data problems.

Big DataMapReducedata-processing

0 likes · 15 min read

Understanding MapReduce: A Simple Analogy to Master Big Data Distributed Computing

Tencent Advertising Technology

Apr 29, 2018 · Artificial Intelligence

Insights and Strategies from Winning the Tencent Advertising Algorithm Competition

The author, a Sun Yat‑sen University undergraduate and repeat weekly champion, shares practical tips on handling large datasets, effective feature engineering, and combining GBDT with a custom deepFFM model to achieve top scores in the Tencent advertising algorithm competition.

GBDTadvertising algorithmsdata-processing

0 likes · 4 min read

Insights and Strategies from Winning the Tencent Advertising Algorithm Competition

MaGe Linux Operations

Apr 23, 2018 · Backend Development

Essential Python Libraries for Web Scraping and Data Processing

A comprehensive catalog of Python libraries covering network communication, web crawling frameworks, HTML/XML parsing, text manipulation, file format handling, natural language processing, browser automation, concurrency, cloud services, email processing, URL manipulation, multimedia extraction, WebSocket support, DNS resolution, computer vision, proxy servers, and other useful tools for developers.

PythonWeb Scrapingautomation

0 likes · 16 min read

Essential Python Libraries for Web Scraping and Data Processing

Tencent Cloud Developer

Apr 12, 2018 · Big Data

Spark Usage in DataMagic Platform: A Practical Guide

This guide explains how DataMagic leverages Spark on YARN for fast, scalable offline analytics—covering Spark’s core role, four steps to master its terminology, configurations, parallelism, and code modification, plus practical deployment scripts, dynamic resource tuning, MongoDB export, job troubleshooting, and cluster upkeep for trillion‑record workloads.

DataMagicSparkSpark optimization

0 likes · 11 min read

Spark Usage in DataMagic Platform: A Practical Guide

Meituan Technology Team

Mar 15, 2018 · Backend Development

Hermes Performance System: Architecture and Implementation for O2O Business

The article presents Hermes, Meituan’s O2O performance management platform for travel, detailing its six‑module, four‑engine architecture—including data, incentive, rule, calculation, and scheduling engines—while highlighting technical innovations such as two‑level caching, work‑stealing producer‑consumer processing, Map‑Reduce‑style calculations, and future AI‑driven enhancements.

BI ToolsCalculation EngineO2O business

0 likes · 16 min read

Hermes Performance System: Architecture and Implementation for O2O Business

ITFLY8 Architecture Home

Feb 25, 2018 · Big Data

Building Scalable Data Platforms with SMACK: Spark, Mesos, Akka, Cassandra & Kafka

Learn how to construct a scalable data processing platform using the SMACK stack—Spark, Mesos, Akka, Cassandra, and Kafka—covering storage design, processing workflows, resource management, deployment options, and fault‑tolerant task execution for both batch and streaming workloads.

AkkaKafkaMesos

0 likes · 14 min read

Building Scalable Data Platforms with SMACK: Spark, Mesos, Akka, Cassandra & Kafka

dbaplus Community

Jan 1, 2018 · Big Data

How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops

This article summarizes Wu Xiaoguang's talk at Gdevops 2017, detailing how Vipshop integrates data processing, analysis, and mining technologies—such as Flume, Kafka, Spark, and custom scheduling—to improve operational decision‑making, performance monitoring, root‑cause analysis, and predictive modeling across its e‑commerce platform.

Big DataData AnalyticsOperations

0 likes · 23 min read

How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops

MaGe Linux Operations

Dec 24, 2017 · Artificial Intelligence

Avoid These Common NumPy Pitfalls When Handling Matrices and Vectors

This article examines four typical traps when using NumPy for matrix and vector operations—confusing array and matrix shapes, inefficient data filtering, ambiguous multiplication syntax, and cumbersome syntax—offering examples, explanations, and comparisons with MATLAB/Octave to help Python users write clearer, more reliable code.

NumPyPitfallsdata-processing

0 likes · 7 min read

Avoid These Common NumPy Pitfalls When Handling Matrices and Vectors

Meitu Technology

Dec 19, 2017 · Big Data

How Meitu Built a Scalable Distributed Bitmap System for Massive Data Processing

This article explains Meitu's development of a distributed bitmap system that leverages the speed and storage efficiency of bitmap structures to handle massive user data, detailing its evolution, architectural choices, implementation practices, and lessons learned to inspire similar big‑data solutions.

Big DataMeituSystem Design

0 likes · 3 min read

How Meitu Built a Scalable Distributed Bitmap System for Massive Data Processing

Tencent Cloud Developer

Nov 15, 2017 · Cloud Computing

How Tencent Cloud Storage Evolved Through Three Eras: From Data Access to Activation

The article traces Tencent Cloud Storage's evolution from basic data access in the early 2010s, through a data‑processing phase driven by video and image workloads, to the current data‑activation era focused on big‑data analytics and cost‑effective cloud migration, highlighting technical features and real‑world use cases.

cloud computingcloud storagedata-processing

0 likes · 8 min read

How Tencent Cloud Storage Evolved Through Three Eras: From Data Access to Activation

21CTO

Sep 5, 2017 · Big Data

Build a PHP Word Count with Hadoop MapReduce: Step-by-Step Guide

This article explains what MapReduce is, when to use it, and how to implement a PHP word‑count and a gold‑price average calculation on an Apache Hadoop cluster, covering installation hints, mapper and reducer scripts, testing commands, and visualizing results with gnuplot.

Big DataGnuplotHadoop

0 likes · 10 min read

Build a PHP Word Count with Hadoop MapReduce: Step-by-Step Guide

Architecture Digest

Sep 3, 2017 · Big Data

An Overview of Big Data Processing Frameworks: Batch, Stream, and Hybrid Systems

This article introduces the evolution of big‑data processing from Google’s MapReduce concept to modern open‑source frameworks, defines big data and its 3V characteristics, outlines typical processing pipelines, and compares batch, stream, and hybrid systems such as Hadoop, Storm, Samza, Spark, and Flink.

Batch ProcessingBig DataFlink

0 likes · 20 min read

An Overview of Big Data Processing Frameworks: Batch, Stream, and Hybrid Systems

MaGe Linux Operations

Aug 10, 2017 · Backend Development

Explore the Ultimate Python Library Collection for Web Crawling and Data Processing

This comprehensive guide lists essential Python libraries for network operations, asynchronous programming, web crawling frameworks, HTML/XML parsing, text handling, data conversion, slug creation, office document manipulation, PDF processing, markdown rendering, YAML handling, CSS utilities, feed parsing, SQL tools, HTTP clients, microformats, executable analysis, PSD handling, natural language processing, browser automation, headless tools, multiprocessing, queues, cloud execution, email handling, URL manipulation, web content extraction, video downloading, wiki archiving, WebSocket communication, DNS queries, computer vision, proxy services, and miscellaneous utilities.

PythonWeb Crawlingdata-processing

0 likes · 17 min read

Explore the Ultimate Python Library Collection for Web Crawling and Data Processing

Tencent Advertising Technology

Jun 23, 2017 · Artificial Intelligence

Weekly Champion nju_newbiew Shares Competition Experience and Technical Insights

The nju_newbiew team, winners of the weekly champion in Tencent Social Ads University Algorithm Competition, recount their data processing, offline validation, feature engineering, and model strategies, highlighting practical machine‑learning lessons while also providing competition announcements and contact information.

AIModel Fusioncompetition

0 likes · 5 min read

Weekly Champion nju_newbiew Shares Competition Experience and Technical Insights

MaGe Linux Operations

May 15, 2017 · Databases

Top 10 Must‑Know Data Storage Tools for Java Developers

Facing ever‑growing complexity, Java developers can streamline their projects by mastering a curated list of essential data storage and processing tools—including MongoDB, Elasticsearch, Cassandra, Redis, Hazelcast, EHCache, Hadoop, Solr, Spark, and Memcached—each offering distinct strengths for modern big‑data applications.

Big DataNoSQLdata-processing

0 likes · 8 min read

Top 10 Must‑Know Data Storage Tools for Java Developers

360 Quality & Efficiency

Apr 24, 2017 · Big Data

Introduction to Hadoop: Architecture, HDFS, MapReduce, and Common Commands

This article introduces Hadoop as a widely used big‑data framework, explains its core components HDFS and MapReduce, describes the cluster node roles, presents typical command‑line usage and a sample MapReduce workflow, and offers guidance for further learning.

HDFSHadoopMapReduce

0 likes · 5 min read

Introduction to Hadoop: Architecture, HDFS, MapReduce, and Common Commands

Java High-Performance Architecture

Apr 4, 2017 · Big Data

Master MapReduce: Principles, Process, and 7 Hands‑On Examples

This tutorial quickly introduces the MapReduce model, explains its core principles and execution flow, and guides you through seven practical examples—from basic WordCount to custom serialization, partitioning, joins, and friend‑recommendation—while providing test data and an optional ready‑made Hadoop environment for hands‑on practice.

HadoopMapReduceTutorial

0 likes · 3 min read

Master MapReduce: Principles, Process, and 7 Hands‑On Examples

Huawei Cloud Developer Alliance

Jan 24, 2017 · Big Data

Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends

This article provides a comprehensive overview of Hadoop as the leading open‑source platform for big‑data processing, detailing its core components HDFS and MapReduce, the evolution to Hadoop 2.0/YARN, and the extensive ecosystem of tools and commercial solutions that enable scalable storage, analysis, and machine‑learning on massive data sets.

Big DataHDFSHadoop

0 likes · 18 min read

Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends

Java High-Performance Architecture

Dec 13, 2016 · Big Data

What Is Apache Beam and How Does It Simplify Distributed Data Processing?

Apache Beam is an open‑source, unified programming model for distributed data processing that lets developers write pipelines once and run them on multiple execution engines such as Spark, Flink, or Dataflow, simplifying code reuse and easing migration between frameworks.

Apache BeamSparkWordCount

0 likes · 5 min read

What Is Apache Beam and How Does It Simplify Distributed Data Processing?

Architects Research Society

Nov 27, 2016 · Big Data

An Introduction to Apache Beam and Its Beam Model for Unified Batch and Stream Processing

This article introduces Apache Beam, its Beam Model, and how the Beam SDK enables developers to write unified, flexible pipelines for both bounded batch jobs and unbounded streaming workloads, illustrating concepts with mobile‑gaming examples and detailed code snippets.

Apache BeamBatch ProcessingBeam Model

0 likes · 19 min read

An Introduction to Apache Beam and Its Beam Model for Unified Batch and Stream Processing

Architecture Digest

Sep 17, 2016 · Big Data

Spark Introduction and Integration with MongoDB: Architecture, Use Cases, and Code Samples

This article introduces Apache Spark as a fast, general‑purpose big‑data engine, explains its ecosystem, compares HDFS with MongoDB, and demonstrates how Spark can be combined with MongoDB through the Mongo‑Spark connector, including real‑world case studies and sample code.

Big DataConnectorMongoDB

0 likes · 18 min read

Spark Introduction and Integration with MongoDB: Architecture, Use Cases, and Code Samples

dbaplus Community

Aug 9, 2016 · Cloud Native

Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization

Qiniu Cloud transformed its high‑traffic data processing platform by containerizing services with Docker, addressing challenges such as massive request volume, CPU‑intensive workloads, IO bottlenecks, and burst traffic through architecture evolution, queueing, rate limiting, auto‑scaling, and secure, isolated custom processing pipelines.

Auto ScalingMicroservicesdata-processing

0 likes · 20 min read

Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization

360 Quality & Efficiency

Jun 6, 2016 · Big Data

Spark and MongoDB Tutorial: Daily Active User Statistics with Scala

This tutorial guides readers through using Apache Spark and MongoDB to compute daily active user statistics, covering Spark fundamentals, a Spark‑vs‑Hadoop comparison, MongoDB use cases, environment setup, Scala code workflow, Maven compilation, and job submission on a YARN cluster.

Big DataMongoDBScala

0 likes · 11 min read

Spark and MongoDB Tutorial: Daily Active User Statistics with Scala

Architecture Digest

Mar 28, 2016 · Big Data

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

This article provides a comprehensive overview of Hadoop and its surrounding ecosystem, detailing core components, storage principles, key algorithms, and a wide range of modern big‑data technologies such as Spark, Flink, Kafka, NoSQL databases, and cloud‑based processing platforms.

Big DataHadoopKafka

0 likes · 11 min read

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

21CTO

Jan 16, 2016 · Fundamentals

Why Mastering Fundamentals Beats Chasing the Latest Tech

The author reflects on a programmer's focus on web, distributed systems, and data processing, arguing that deep, continuous investment in fundamentals—such as algorithms, networking, and OS concepts—drives lasting skill growth, better project outcomes, and a healthier professional mindset.

data-processingknowledge acquisitionprogramming fundamentals

0 likes · 8 min read

Why Mastering Fundamentals Beats Chasing the Latest Tech

Architect

Dec 4, 2015 · Operations

Evolution of Qiniu Cloud Data Processing Architecture

The article explains how Qiniu's data processing platform has evolved from a simple real‑time URL‑based model to a more complex architecture featuring separate caching, agent services, discover monitoring, and container‑based elastic scaling to handle massive unstructured data workloads.

Real-time Processingcloud architecturecontainerization

0 likes · 9 min read

Evolution of Qiniu Cloud Data Processing Architecture

21CTO

Nov 13, 2015 · Backend Development

Essential Python Libraries for Web Scraping and Data Processing

Discover a comprehensive collection of Python libraries covering network requests, web crawling frameworks, HTML/XML parsing, text manipulation, file format handling, natural language processing, browser automation, asynchronous programming, and more, providing developers with essential tools for efficient web scraping and data processing tasks.

PythonWeb Scrapingdata-processing

0 likes · 18 min read

Qunar Tech Salon

Aug 17, 2015 · Big Data

Comprehensive Overview of Open‑Source Big Data Tools and Platforms

This article presents a detailed, categorized catalogue of more than fifty open‑source big‑data projects—including Hadoop‑related utilities, analytics platforms, databases, BI solutions, data‑mining packages, query engines, programming languages, search tools, and in‑memory technologies—highlighting their primary functions, supported operating systems, and official links.

AnalyticsHadoopIn-Memory

0 likes · 31 min read

Comprehensive Overview of Open‑Source Big Data Tools and Platforms

21CTO

Aug 11, 2015 · Big Data

Understanding MapReduce Through a Pizza Sauce Analogy

The author recounts delivering a MapReduce talk, then uses a vivid pizza sauce preparation story to illustrate how mapping chops ingredients and reducing blends them, effectively explaining distributed data processing concepts to a non‑technical audience.

AnalogyMapReducedata-processing

0 likes · 7 min read

Understanding MapReduce Through a Pizza Sauce Analogy

MaGe Linux Operations

Jan 9, 2015 · Big Data

Explore the Complete Hadoop Ecosystem: 20+ Projects and Learning Roadmap

This article provides a comprehensive overview of the Hadoop family—detailing more than twenty open‑source projects, their core functions, and a structured learning roadmap to help developers master Hadoop, Hive, Pig, HBase, Zookeeper, Mahout, and related tools.

ApacheBig DataEcosystem

0 likes · 10 min read

Explore the Complete Hadoop Ecosystem: 20+ Projects and Learning Roadmap

Qunar Tech Salon

Dec 4, 2014 · Big Data

Understanding Apache Spark: Architecture, Comparison with Hadoop, Features, and Use Cases

The article explains Apache Spark’s memory‑based distributed computing model, its advantages over Hadoop’s MapReduce, key features, fault tolerance, deployment modes, ecosystem components, and the scenarios where Spark is most effective for large‑scale data analytics.

HadoopSparkdata-processing

0 likes · 7 min read

Understanding Apache Spark: Architecture, Comparison with Hadoop, Features, and Use Cases

Baidu Tech Salon

Oct 29, 2014 · Big Data

Inside Baidu’s Real-Time Big Data Platforms: Dstream and TM Explained

This article examines Baidu’s home‑grown real‑time big‑data platforms Dstream and TM, detailing their architectures, performance metrics, key features, and practical use cases such as log ETL and real‑time bidding, while highlighting how they meet millisecond‑level processing demands.

BaiduBig DataDstream

0 likes · 9 min read

Inside Baidu’s Real-Time Big Data Platforms: Dstream and TM Explained

MaGe Linux Operations

Aug 29, 2014 · Backend Development

15 Must‑Know Python Open‑Source Frameworks for Modern Development

This article presents a curated list of fifteen popular Python open‑source frameworks—including web, API, OLAP, concurrency, and web‑crawling tools—explaining their core features and typical use cases for developers seeking robust, community‑backed solutions.

BackendPythondata-processing

0 likes · 6 min read