Tagged articles
276 articles
Page 3 of 3
DataFunSummit
DataFunSummit
Nov 10, 2020 · Artificial Intelligence

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, detailing its core algorithms, performance advantages over Spark ML, version evolution, Maven and PyAlink installation steps, data‑source integrations, FM algorithm support, and unified file‑system operations for both batch and streaming workloads.

AlinkFlinkPyAlink
0 likes · 11 min read
Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide
Youku Technology
Youku Technology
Oct 16, 2020 · Mobile Development

How Youku Achieved Seamless Multi‑Device UI with a Responsive Android SDK

This article explains Youku's Android responsive solution, covering the responsive SDK, loading flow, architecture, data reprocessing, page and container responsiveness, and control size adaptation, providing practical guidelines for building a single app that adapts to diverse device sizes and form factors.

AndroidFoldable ScreensResponsive Design
0 likes · 13 min read
How Youku Achieved Seamless Multi‑Device UI with a Responsive Android SDK
Amap Tech
Amap Tech
Sep 24, 2020 · Artificial Intelligence

How High‑Precision Maps Power Autonomous Driving: Inside Amap’s AI and Cloud Strategies

The article details Amap’s (Gaode) technical approach to building and deploying high‑precision maps for autonomous driving, covering accuracy requirements, data collection, point‑cloud alignment, AI‑driven perception and map‑update pipelines, and the challenges of scale, cost, and freshness.

AI Algorithmsautonomous drivingdata-processing
0 likes · 10 min read
How High‑Precision Maps Power Autonomous Driving: Inside Amap’s AI and Cloud Strategies
DataFunTalk
DataFunTalk
Sep 20, 2020 · Artificial Intelligence

Building a Production‑Ready Recommendation System with Python, LLR, and ElasticSearch

This tutorial explains how to construct a recommendation system by loading transaction data, creating sparse user‑item and item‑item matrices, applying Log‑Likelihood Ratio for item similarity, and indexing the results into ElasticSearch for real‑time serving, using Python and open‑source big‑data tools.

LLRPythondata-processing
0 likes · 16 min read
Building a Production‑Ready Recommendation System with Python, LLR, and ElasticSearch
Java Captain
Java Captain
Aug 24, 2020 · Backend Development

Java 8 Stream API: Grouping, Mapping, Filtering, Summing and Other Collection Operations

This article demonstrates how to leverage Java 8 Stream API to perform common collection operations such as defining a data class, creating test data, grouping by fields, converting lists to maps, filtering, summing numeric fields, finding max/min values, removing duplicates, and explains the Collectors utility methods.

Collectiondata-processingjava8
0 likes · 9 min read
Java 8 Stream API: Grouping, Mapping, Filtering, Summing and Other Collection Operations
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 22, 2020 · Big Data

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

This article surveys the rapidly evolving big data landscape by reviewing a wide range of Apache projects—including Hadoop, Spark, Flink, HBase, Kudu, Impala, Kafka, and others—detailing their core components, architectures, strengths, and typical use‑cases for building distributed data platforms.

ApacheBig DataDistributed Systems
0 likes · 20 min read
Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 3, 2020 · Backend Development

Restructuring of Voting Service for 'You Are My Youth 2' to Enhance Scalability and Maintainability

The voting service for 'You Are My Youth 2' was re‑architected using Docker‑based QAE and the Skywalker microservices platform, adding containerized one‑click scaling, cross‑data‑center MySQL/Couchbase/HBase high availability, and Hive/Impala real‑time processing, which doubled performance, cut preparation from 30 days to 12 hours, and incorporated third‑party audit verification.

MicroservicesScalabilityVoting Service
0 likes · 12 min read
Restructuring of Voting Service for 'You Are My Youth 2' to Enhance Scalability and Maintainability
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 30, 2020 · Fundamentals

Master Excel‑Pandas Integration: From Data Import to Visualization in Python

This tutorial demonstrates how to combine Excel’s interactive features with Python’s Pandas library to perform comprehensive data operations—including reading, generating, filtering, sorting, handling missing values, deduplication, merging, grouping, calculation, statistics, visualization, sampling, pivot tables, and VLOOKUP—showing when each tool excels.

ExcelPythondata-processing
0 likes · 13 min read
Master Excel‑Pandas Integration: From Data Import to Visualization in Python
MaGe Linux Operations
MaGe Linux Operations
Jun 9, 2020 · Backend Development

How to Search Student Records in Excel Using Python xlrd

This tutorial demonstrates how to use Python's xlrd library to read an Excel file containing student records and retrieve a specific student's information by name or ID, covering installation, code walkthrough, and sample output.

Exceldata-processingsearch
0 likes · 4 min read
How to Search Student Records in Excel Using Python xlrd
Architect
Architect
May 29, 2020 · Artificial Intelligence

Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines

This article explains how to combine the Flink data‑processing engine with TensorFlow to create a unified, end‑to‑end machine‑learning workflow, covering background, challenges, the Flink‑AI‑extended architecture, ML framework and operator abstractions, and both batch and streaming training and prediction modes.

AI integrationDistributed TrainingFlink
0 likes · 9 min read
Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines
Big Data Technology & Architecture
Big Data Technology & Architecture
May 10, 2020 · Big Data

Apache Beam Overview: Architecture, Programming Model, PCollection, Pipeline and Transform

This article provides a comprehensive introduction to Apache Beam, covering its unified batch‑and‑stream processing architecture, programming model, workflow patterns, Lambda and Kappa architectures, the characteristics of PCollection, pipeline construction, core transforms, I/O handling, and includes practical code examples.

Apache BeamBig DataLambda architecture
0 likes · 14 min read
Apache Beam Overview: Architecture, Programming Model, PCollection, Pipeline and Transform
Liangxu Linux
Liangxu Linux
Apr 25, 2020 · Operations

Why Dumping Logs into a DB Fails and How Awk Solves the Problem

The article explains why loading all log data into a database is impractical, outlines three drawbacks—volatile requests, data bloat, and cost—and introduces the lightweight awk tool with concrete command examples to filter and analyze network logs efficiently without a database.

Sysadminawkdata-processing
0 likes · 6 min read
Why Dumping Logs into a DB Fails and How Awk Solves the Problem
MaGe Linux Operations
MaGe Linux Operations
Apr 1, 2020 · Backend Development

15 Must‑Know Python Open‑Source Frameworks for Modern Development

This article compiles the 15 most popular open‑source Python frameworks—from full‑stack web solutions like Django and Flask to specialized tools for event I/O, OLAP, distributed computing, and continuous integration—providing concise descriptions to help developers choose the right library for their projects.

data-processingframeworksweb-development
0 likes · 6 min read
15 Must‑Know Python Open‑Source Frameworks for Modern Development
DataFunTalk
DataFunTalk
Mar 6, 2020 · Artificial Intelligence

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

This article reviews recent progress in Apache Flink's AI ecosystem, explaining how Flink unifies batch and stream processing for machine‑learning pipelines, introduces the Flink ML Pipeline and Alink library, describes the AI Flow framework for end‑to‑end ML workflows, and presents a novel mini‑batch streaming iteration mechanism to support both offline and online learning scenarios.

AI FlowApache FlinkMini-batch Iteration
0 likes · 13 min read
Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration
Qunar Tech Salon
Qunar Tech Salon
Feb 20, 2020 · Operations

Design and Implementation of Business‑Driven Monitoring Systems at JD Cloud

This article explains why monitoring is essential for operations, outlines the four‑layer monitoring standard (infrastructure, liveliness, performance, business), breaks down functional modules and data flows, and showcases JD Cloud's practical design, alarm‑convergence project, and future AI‑driven observability directions.

JD CloudOperationsalert convergence
0 likes · 12 min read
Design and Implementation of Business‑Driven Monitoring Systems at JD Cloud
Architects Research Society
Architects Research Society
Dec 25, 2019 · Cloud Native

Common Use Cases of the OpenWhisk Serverless Platform

The article outlines how OpenWhisk’s serverless execution model supports diverse use cases—including microservices, web and mobile back‑ends, IoT pipelines, API services, data processing, and cognitive applications—highlighting its modularity, language flexibility, automatic scaling, and integration with cloud services.

APICloud NativeIoT
0 likes · 8 min read
Common Use Cases of the OpenWhisk Serverless Platform
Programmer DD
Programmer DD
Dec 18, 2019 · Backend Development

Master Java 8 Streams: From Basics to Advanced Operations

This article introduces Java 8's Stream API, explains why functional streams improve code readability and performance, and provides detailed examples of common operations such as filter, map, flatMap, reduce, collect, Optional handling, parallel processing, and debugging techniques for efficient data processing.

Java 8LambdaStream API
0 likes · 16 min read
Master Java 8 Streams: From Basics to Advanced Operations
ITPUB
ITPUB
Dec 9, 2019 · Fundamentals

Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation

This tutorial walks through loading order data into pandas and SQL, then demonstrates how to retrieve current dates, extract date components, convert between readable dates and Unix timestamps, transform between 10‑digit and 8‑digit date formats, and perform date arithmetic using pandas, MySQL, and Hive.

data-processingdate handlingdatetime
0 likes · 16 min read
Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation
DataFunTalk
DataFunTalk
Nov 21, 2019 · Big Data

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

The article details the technical evolution of 58.com’s real-time computing platform—from Storm and Spark Streaming to a Flink‑based one‑stop solution called Wstream—covering use cases, architecture, stability measures, migration from Storm, operational diagnostics, and future development plans.

Big DataFlinkReal-time Streaming
0 likes · 11 min read
Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream
DataFunTalk
DataFunTalk
Sep 5, 2019 · Big Data

Apache Beam Architecture Principles and Practical Application

This article introduces Apache Beam as a unified programming model for batch and streaming data processing, explains its architecture, core components, advantages, extensibility, and demonstrates practical usage with KafkaIO, BeamSQL, and AIoT scenarios across multiple runners.

Apache BeamKafkaStreaming
0 likes · 16 min read
Apache Beam Architecture Principles and Practical Application
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 1, 2019 · Big Data

Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture

This article examines the technical challenges of large‑scale data processing, compares the classic Lambda and Kappa architectures, introduces the unified stream‑batch Lambda+ design built on Tablestore and Blink, and outlines suitable scenarios and practical solutions for modern big‑data systems.

Big DataKappa architectureLambda architecture
0 likes · 16 min read
Why Lambda, Kappa, and Lambda+ Are Shaping Modern Big Data Architecture
Xianyu Technology
Xianyu Technology
Jun 20, 2019 · Big Data

Design of a High-Performance Real-Time Data Processing System for Service Diagnosis

The paper presents a high‑performance real‑time data processing pipeline that collects, transports, preprocesses, and computes service logs and metrics using Alibaba Logtail, LogHub, and an enhanced Flink (Blink) engine, persisting root‑cause graphs in Lindorm, achieving sub‑3‑second latency for tens of millions of events per second and cutting diagnosis time to about five seconds.

FlinkReal-time Streamingarchitecture
0 likes · 10 min read
Design of a High-Performance Real-Time Data Processing System for Service Diagnosis
Liangxu Linux
Liangxu Linux
Apr 14, 2019 · Backend Development

Master JSON Formatting and Extraction on Linux with jq

This guide explains what jq is, how to install it on various Linux distributions, and provides step‑by‑step examples for pretty‑printing JSON, extracting specific fields, handling arrays, and using built‑in functions like keys and has, all with clear command‑line snippets.

command-linedata-processingjq
0 likes · 6 min read
Master JSON Formatting and Extraction on Linux with jq
MaGe Linux Operations
MaGe Linux Operations
Oct 19, 2018 · Artificial Intelligence

Why Numpy’s Array vs Matrix Can Trip Up Your Machine Learning Projects

The article examines common pitfalls when using NumPy arrays and matrices for data manipulation in machine learning, highlighting chaotic data structures, inefficient filtering, confusing arithmetic syntax, and unintuitive code patterns compared to MATLAB/Octave, and concludes with a critique of Python’s ergonomics.

NumPyPythondata-processing
0 likes · 7 min read
Why Numpy’s Array vs Matrix Can Trip Up Your Machine Learning Projects
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 19, 2018 · Artificial Intelligence

Inside Alibaba Damo Academy’s 2018 Vision: AI, Chips, Quantum & Global Labs

At the 2018 Hangzhou Cloud Expo, Alibaba’s CTO and Damo Academy director Zhang Jianfeng unveiled the institute’s global expansion, new semiconductor venture, AI research pillars, upcoming NPU chip, quantum computing initiatives, and the youth-focused Damo Academy Green Orange Award, highlighting a comprehensive strategy for data, algorithms, and computing power.

Quantum ComputingResearch Labsdata-processing
0 likes · 8 min read
Inside Alibaba Damo Academy’s 2018 Vision: AI, Chips, Quantum & Global Labs
360 Tech Engineering
360 Tech Engineering
Aug 7, 2018 · Big Data

Evolution and Practice of 360 Big Data Center Platform

The article presents a comprehensive overview of 360's Big Data Center evolution, covering business background, platform‑as‑a‑service architecture, data asset management, user‑profile unification, platform milestones, technical architecture, performance optimizations, online query capabilities, future plans, and a Q&A session.

360Data GovernanceData Platform
0 likes · 22 min read
Evolution and Practice of 360 Big Data Center Platform
Meituan Technology Team
Meituan Technology Team
Aug 2, 2018 · Big Data

R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan

Meituan’s in‑store dining team demonstrates how R’s open‑source packages, powerful data manipulation, rich visualization libraries, and reproducible reporting can be engineered into scalable, parallelized workflows that turn secondary data processing into fast, interactive dashboards and analytics, proving R’s enterprise‑grade performance and adoption.

Big DataData visualizationR
0 likes · 18 min read
R for Fine‑Grained Data Operations: Engineering Practices and Performance at Meituan
ITPUB
ITPUB
Jun 10, 2018 · Big Data

13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem

This article introduces Hadoop’s origins and core challenges, then presents thirteen essential open‑source tools spanning resource scheduling, real‑time query engines, and additional processing frameworks, detailing each project's purpose, key features, and repository locations to help practitioners choose the right component for big‑data workloads.

HadoopImpalaSpark
0 likes · 12 min read
13 Must‑Know Open‑Source Tools in the Hadoop Ecosystem
MaGe Linux Operations
MaGe Linux Operations
Apr 23, 2018 · Backend Development

Essential Python Libraries for Web Scraping and Data Processing

A comprehensive catalog of Python libraries covering network communication, web crawling frameworks, HTML/XML parsing, text manipulation, file format handling, natural language processing, browser automation, concurrency, cloud services, email processing, URL manipulation, multimedia extraction, WebSocket support, DNS resolution, computer vision, proxy servers, and other useful tools for developers.

PythonWeb Scrapingautomation
0 likes · 16 min read
Essential Python Libraries for Web Scraping and Data Processing
Tencent Cloud Developer
Tencent Cloud Developer
Apr 12, 2018 · Big Data

Spark Usage in DataMagic Platform: A Practical Guide

This guide explains how DataMagic leverages Spark on YARN for fast, scalable offline analytics—covering Spark’s core role, four steps to master its terminology, configurations, parallelism, and code modification, plus practical deployment scripts, dynamic resource tuning, MongoDB export, job troubleshooting, and cluster upkeep for trillion‑record workloads.

DataMagicSparkSpark optimization
0 likes · 11 min read
Spark Usage in DataMagic Platform: A Practical Guide
Meituan Technology Team
Meituan Technology Team
Mar 15, 2018 · Backend Development

Hermes Performance System: Architecture and Implementation for O2O Business

The article presents Hermes, Meituan’s O2O performance management platform for travel, detailing its six‑module, four‑engine architecture—including data, incentive, rule, calculation, and scheduling engines—while highlighting technical innovations such as two‑level caching, work‑stealing producer‑consumer processing, Map‑Reduce‑style calculations, and future AI‑driven enhancements.

BI ToolsCalculation EngineO2O business
0 likes · 16 min read
Hermes Performance System: Architecture and Implementation for O2O Business
dbaplus Community
dbaplus Community
Jan 1, 2018 · Big Data

How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops

This article summarizes Wu Xiaoguang's talk at Gdevops 2017, detailing how Vipshop integrates data processing, analysis, and mining technologies—such as Flume, Kafka, Spark, and custom scheduling—to improve operational decision‑making, performance monitoring, root‑cause analysis, and predictive modeling across its e‑commerce platform.

Big DataData AnalyticsOperations
0 likes · 23 min read
How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops
MaGe Linux Operations
MaGe Linux Operations
Dec 24, 2017 · Artificial Intelligence

Avoid These Common NumPy Pitfalls When Handling Matrices and Vectors

This article examines four typical traps when using NumPy for matrix and vector operations—confusing array and matrix shapes, inefficient data filtering, ambiguous multiplication syntax, and cumbersome syntax—offering examples, explanations, and comparisons with MATLAB/Octave to help Python users write clearer, more reliable code.

NumPyPitfallsdata-processing
0 likes · 7 min read
Avoid These Common NumPy Pitfalls When Handling Matrices and Vectors
Tencent Cloud Developer
Tencent Cloud Developer
Nov 15, 2017 · Cloud Computing

How Tencent Cloud Storage Evolved Through Three Eras: From Data Access to Activation

The article traces Tencent Cloud Storage's evolution from basic data access in the early 2010s, through a data‑processing phase driven by video and image workloads, to the current data‑activation era focused on big‑data analytics and cost‑effective cloud migration, highlighting technical features and real‑world use cases.

cloud computingcloud storagedata-processing
0 likes · 8 min read
How Tencent Cloud Storage Evolved Through Three Eras: From Data Access to Activation
21CTO
21CTO
Sep 5, 2017 · Big Data

Build a PHP Word Count with Hadoop MapReduce: Step-by-Step Guide

This article explains what MapReduce is, when to use it, and how to implement a PHP word‑count and a gold‑price average calculation on an Apache Hadoop cluster, covering installation hints, mapper and reducer scripts, testing commands, and visualizing results with gnuplot.

Big DataGnuplotHadoop
0 likes · 10 min read
Build a PHP Word Count with Hadoop MapReduce: Step-by-Step Guide
MaGe Linux Operations
MaGe Linux Operations
Aug 10, 2017 · Backend Development

Explore the Ultimate Python Library Collection for Web Crawling and Data Processing

This comprehensive guide lists essential Python libraries for network operations, asynchronous programming, web crawling frameworks, HTML/XML parsing, text handling, data conversion, slug creation, office document manipulation, PDF processing, markdown rendering, YAML handling, CSS utilities, feed parsing, SQL tools, HTTP clients, microformats, executable analysis, PSD handling, natural language processing, browser automation, headless tools, multiprocessing, queues, cloud execution, email handling, URL manipulation, web content extraction, video downloading, wiki archiving, WebSocket communication, DNS queries, computer vision, proxy services, and miscellaneous utilities.

PythonWeb Crawlingdata-processing
0 likes · 17 min read
Explore the Ultimate Python Library Collection for Web Crawling and Data Processing
Tencent Advertising Technology
Tencent Advertising Technology
Jun 23, 2017 · Artificial Intelligence

Weekly Champion nju_newbiew Shares Competition Experience and Technical Insights

The nju_newbiew team, winners of the weekly champion in Tencent Social Ads University Algorithm Competition, recount their data processing, offline validation, feature engineering, and model strategies, highlighting practical machine‑learning lessons while also providing competition announcements and contact information.

AIModel Fusioncompetition
0 likes · 5 min read
Weekly Champion nju_newbiew Shares Competition Experience and Technical Insights
MaGe Linux Operations
MaGe Linux Operations
May 15, 2017 · Databases

Top 10 Must‑Know Data Storage Tools for Java Developers

Facing ever‑growing complexity, Java developers can streamline their projects by mastering a curated list of essential data storage and processing tools—including MongoDB, Elasticsearch, Cassandra, Redis, Hazelcast, EHCache, Hadoop, Solr, Spark, and Memcached—each offering distinct strengths for modern big‑data applications.

Big DataNoSQLdata-processing
0 likes · 8 min read
Top 10 Must‑Know Data Storage Tools for Java Developers
Java High-Performance Architecture
Java High-Performance Architecture
Apr 4, 2017 · Big Data

Master MapReduce: Principles, Process, and 7 Hands‑On Examples

This tutorial quickly introduces the MapReduce model, explains its core principles and execution flow, and guides you through seven practical examples—from basic WordCount to custom serialization, partitioning, joins, and friend‑recommendation—while providing test data and an optional ready‑made Hadoop environment for hands‑on practice.

HadoopMapReduceTutorial
0 likes · 3 min read
Master MapReduce: Principles, Process, and 7 Hands‑On Examples
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jan 24, 2017 · Big Data

Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends

This article provides a comprehensive overview of Hadoop as the leading open‑source platform for big‑data processing, detailing its core components HDFS and MapReduce, the evolution to Hadoop 2.0/YARN, and the extensive ecosystem of tools and commercial solutions that enable scalable storage, analysis, and machine‑learning on massive data sets.

Big DataHDFSHadoop
0 likes · 18 min read
Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends
dbaplus Community
dbaplus Community
Aug 9, 2016 · Cloud Native

Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization

Qiniu Cloud transformed its high‑traffic data processing platform by containerizing services with Docker, addressing challenges such as massive request volume, CPU‑intensive workloads, IO bottlenecks, and burst traffic through architecture evolution, queueing, rate limiting, auto‑scaling, and secure, isolated custom processing pipelines.

Auto ScalingMicroservicesdata-processing
0 likes · 20 min read
Scaling Qiniu Cloud's Custom Data Processing with Docker Containerization
Architecture Digest
Architecture Digest
Mar 28, 2016 · Big Data

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

This article provides a comprehensive overview of Hadoop and its surrounding ecosystem, detailing core components, storage principles, key algorithms, and a wide range of modern big‑data technologies such as Spark, Flink, Kafka, NoSQL databases, and cloud‑based processing platforms.

Big DataHadoopKafka
0 likes · 11 min read
Overview of the Hadoop Ecosystem and Modern Big Data Technologies
21CTO
21CTO
Jan 16, 2016 · Fundamentals

Why Mastering Fundamentals Beats Chasing the Latest Tech

The author reflects on a programmer's focus on web, distributed systems, and data processing, arguing that deep, continuous investment in fundamentals—such as algorithms, networking, and OS concepts—drives lasting skill growth, better project outcomes, and a healthier professional mindset.

data-processingknowledge acquisitionprogramming fundamentals
0 likes · 8 min read
Why Mastering Fundamentals Beats Chasing the Latest Tech
Architect
Architect
Dec 4, 2015 · Operations

Evolution of Qiniu Cloud Data Processing Architecture

The article explains how Qiniu's data processing platform has evolved from a simple real‑time URL‑based model to a more complex architecture featuring separate caching, agent services, discover monitoring, and container‑based elastic scaling to handle massive unstructured data workloads.

Real-time Processingcloud architecturecontainerization
0 likes · 9 min read
Evolution of Qiniu Cloud Data Processing Architecture
21CTO
21CTO
Nov 13, 2015 · Backend Development

Essential Python Libraries for Web Scraping and Data Processing

Discover a comprehensive collection of Python libraries covering network requests, web crawling frameworks, HTML/XML parsing, text manipulation, file format handling, natural language processing, browser automation, asynchronous programming, and more, providing developers with essential tools for efficient web scraping and data processing tasks.

PythonWeb Scrapingdata-processing
0 likes · 18 min read
Essential Python Libraries for Web Scraping and Data Processing
Qunar Tech Salon
Qunar Tech Salon
Aug 17, 2015 · Big Data

Comprehensive Overview of Open‑Source Big Data Tools and Platforms

This article presents a detailed, categorized catalogue of more than fifty open‑source big‑data projects—including Hadoop‑related utilities, analytics platforms, databases, BI solutions, data‑mining packages, query engines, programming languages, search tools, and in‑memory technologies—highlighting their primary functions, supported operating systems, and official links.

AnalyticsHadoopIn-Memory
0 likes · 31 min read
Comprehensive Overview of Open‑Source Big Data Tools and Platforms
21CTO
21CTO
Aug 11, 2015 · Big Data

Understanding MapReduce Through a Pizza Sauce Analogy

The author recounts delivering a MapReduce talk, then uses a vivid pizza sauce preparation story to illustrate how mapping chops ingredients and reducing blends them, effectively explaining distributed data processing concepts to a non‑technical audience.

AnalogyMapReducedata-processing
0 likes · 7 min read
Understanding MapReduce Through a Pizza Sauce Analogy
Baidu Tech Salon
Baidu Tech Salon
Oct 29, 2014 · Big Data

Inside Baidu’s Real-Time Big Data Platforms: Dstream and TM Explained

This article examines Baidu’s home‑grown real‑time big‑data platforms Dstream and TM, detailing their architectures, performance metrics, key features, and practical use cases such as log ETL and real‑time bidding, while highlighting how they meet millisecond‑level processing demands.

BaiduBig DataDstream
0 likes · 9 min read
Inside Baidu’s Real-Time Big Data Platforms: Dstream and TM Explained