Tagged articles

3672 articles

Page 36 of 37

Aug 4, 2016 · Big Data

Heron vs. Storm: Architecture, Performance Evaluation, and Design Lessons

The article provides a comprehensive overview of Twitter's Heron stream processing system, comparing its architecture, design principles, back‑pressure mechanisms, resource utilization, and performance test results with Storm/JStorm, and concludes with practical insights for large‑scale deployments.

Big DataHeronStorm

0 likes · 23 min read

Heron vs. Storm: Architecture, Performance Evaluation, and Design Lessons

Qunar Tech Salon

Jul 27, 2016 · Big Data

Building a Unified Real-Time Data Platform at Ctrip: Architecture, Practices, and Lessons Learned

This article describes Ctrip's development of a unified real-time data platform, detailing its motivations, architectural choices such as Kafka and Storm, implementation of shared schemas, resource control, monitoring, and operational lessons, as well as experiences with Storm, JStorm, and Streaming CQL.

Big DataCtripData Platform

0 likes · 15 min read

Building a Unified Real-Time Data Platform at Ctrip: Architecture, Practices, and Lessons Learned

ITPUB

Jul 19, 2016 · Big Data

From Traditional Data Warehouses to Big Data: Practical Techniques and Migration Insights

The talk shares hands‑on experiences and best‑practice methods for traditional data‑warehouse processing, public and behavioral data handling in big‑data environments, and practical guidance for migrating legacy warehouses to modern Hadoop‑based platforms, emphasizing data governance, security, and performance optimization.

Big DataData GovernanceData Warehouse

0 likes · 13 min read

From Traditional Data Warehouses to Big Data: Practical Techniques and Migration Insights

Architect

Jul 14, 2016 · Big Data

Understanding Custom Stream IDs and Topology Building in Apache Storm

This article explains how to construct Apache Storm topologies with custom stream IDs, demonstrates the classic WordCountTopology example, and provides detailed Java code snippets illustrating spout and bolt configurations, stream declarations, and grouping strategies for real‑time stream processing.

Apache StormBig DataCustom Stream ID

0 likes · 8 min read

Understanding Custom Stream IDs and Topology Building in Apache Storm

Baidu Intelligent Testing

Jul 13, 2016 · Artificial Intelligence

Detecting Offline Merchant Service Issues Using Machine Learning and Big Data at Nuomi

The article describes how Nuomi analyzes refund and complaint data with machine‑learning and big‑data techniques, extracts features for single‑ and multi‑store scenarios, builds decision‑tree models with regional adjustments, and creates an online workflow to promptly intervene on merchants that fail to serve customers.

Big Datacustomer experiencedecision tree

0 likes · 5 min read

Detecting Offline Merchant Service Issues Using Machine Learning and Big Data at Nuomi

Efficient Ops

Jul 11, 2016 · Operations

How Tencent's Intelligent Monitoring Transforms Ops Automation

Leveraging Tencent's extensive experience in social platform operations, this talk explores intelligent monitoring practices—covering active, passive, and side‑channel techniques, full‑link observability, data processing pipelines, and alert convergence—to enhance reliability, availability, and user experience while reducing noise for ops teams.

Alert ManagementAutomationBig Data

0 likes · 22 min read

How Tencent's Intelligent Monitoring Transforms Ops Automation

ITPUB

Jul 10, 2016 · Big Data

Can Spark Really Process Hundreds of Terabytes Interactively?

This article examines Apache Spark's interactive mode performance, revealing that while small datasets respond within seconds, processing beyond about 1 TB dramatically increases latency, and it discusses practical limits, hardware considerations, and the need to preload large datasets from disk.

Apache SparkBig DataResponse Time

0 likes · 5 min read

Can Spark Really Process Hundreds of Terabytes Interactively?

Architecture Digest

Jul 5, 2016 · Big Data

Why Map‑Reduce Is Not the Solution to Your Big Data Problem – A Critical Look at Hadoop

The article reviews Hadoop’s origins from Google’s pioneering papers, explains its architecture and ecosystem, evaluates its strengths such as scalability and benchmarks, discusses current limitations like single‑point failures and complex programming, and outlines upcoming improvements including HDFS Federation and next‑generation MapReduce.

Big DataFutureHDFS

0 likes · 14 min read

Why Map‑Reduce Is Not the Solution to Your Big Data Problem – A Critical Look at Hadoop

ITPUB

Jun 29, 2016 · Big Data

Why OLTP Falls Short for Big Data: OLAP, Hadoop & MPP Explained

The article explains how traditional OLTP systems cannot satisfy modern big‑data analytics needs and compares OLAP, Hadoop, and MPP architectures, highlighting their data processing models, scalability, cloud‑based managed services, and practical recommendations for building effective data warehouses.

Big DataCloud ServicesData Warehouse

0 likes · 21 min read

Why OLTP Falls Short for Big Data: OLAP, Hadoop & MPP Explained

Qunar Tech Salon

Jun 24, 2016 · Backend Development

Overview of Alibaba's Open Source Projects

This article provides a comprehensive overview of Alibaba's numerous open‑source projects, ranging from high‑performance service frameworks and databases to messaging middleware, frontend tools, testing platforms, and infrastructure utilities, highlighting their key features and typical use cases.

AlibabaBackendBig Data

0 likes · 22 min read

Overview of Alibaba's Open Source Projects

Efficient Ops

Jun 19, 2016 · Operations

How Real‑Time Log Analysis Is Revolutionizing IT Operations

This article summarizes a 2016 Global Operations conference talk that explains the concept of IT Operations Analytics (ITOA), its four data sources, the evolution of log management from databases to real‑time search engines, and real‑world case studies demonstrating how fast, large‑scale log analysis improves monitoring, security, and business insight.

Big DataIT Operationslog analysis

0 likes · 25 min read

How Real‑Time Log Analysis Is Revolutionizing IT Operations

21CTO

Jun 18, 2016 · Databases

Unlock Ultra‑High Compression with HiStore’s Knowledge‑Grid Columnar Database

HiStore, Alibaba’s columnar database built on a patented Knowledge‑Grid, delivers ultra‑high compression (over 10:1, up to 40:1), low‑cost storage, rapid query performance, linear scalability, and seamless MySQL compatibility, making it ideal for massive OLAP workloads and real‑time analytics across diverse industries.

Big DataColumnar DatabaseOLAP

0 likes · 8 min read

Unlock Ultra‑High Compression with HiStore’s Knowledge‑Grid Columnar Database

21CTO

Jun 17, 2016 · Fundamentals

2016 Programmer Salary Survey: Who Earns the Most and Emerging Tech Trends

The 2016 programmer salary report reveals that front‑end, back‑end and mobile developers dominate the workforce, big‑data engineers command the highest pay, senior engineers see sharp salary jumps, and emerging technologies like Swift, WeChat, and Python shape future career choices.

BackendBig DataMobile Development

0 likes · 8 min read

2016 Programmer Salary Survey: Who Earns the Most and Emerging Tech Trends

21CTO

Jun 15, 2016 · Big Data

Choosing the Right Data Ingestion Tool: Flume, Fluentd, Logstash, and More

This article reviews major data collection platforms—including Apache Flume, Fluentd, Logstash, Chukwa, Scribe, and Splunk Forwarder—explaining their architectures, strengths, and limitations to help engineers select the most reliable and scalable solution for big‑data pipelines.

Apache FlumeBig DataFluentd

0 likes · 10 min read

Choosing the Right Data Ingestion Tool: Flume, Fluentd, Logstash, and More

iFlytek Mobile Internet Technology Team

Jun 14, 2016 · Big Data

How BitMap Accelerates Active-Day Distribution Calculations in Big Data

BitMap, a space‑saving bit‑array structure, can replace costly I/O‑heavy Spark jobs for computing user active‑day distributions by converting joins and distinct operations into fast bitwise logic, enabling efficient 30‑day rolling metrics with minimal memory and superior performance, as demonstrated by real‑world benchmarks.

Active DaysBig DataSpark

0 likes · 8 min read

How BitMap Accelerates Active-Day Distribution Calculations in Big Data

ITPUB

Jun 11, 2016 · Big Data

How 58 Daojia Leverages User Portraits to Boost Operations and Fight Fraud

This article details 58 Daojia's data‑driven approach to building user‑portrait tags, covering tag construction, evaluation, and practical applications such as personalized recommendations, anti‑fraud measures, coupon distribution, and dynamic pricing, while outlining the underlying big‑data architecture and technical challenges.

Big Dataanti-frauddata mining

0 likes · 18 min read

How 58 Daojia Leverages User Portraits to Boost Operations and Fight Fraud

Art of Distributed System Architecture Design

Jun 11, 2016 · Big Data

Overview of Open-Source Real-Time Stream Processing Systems

This article provides a concise overview of several open‑source real‑time stream processing platforms—including S4, Storm, StreamBase, HStreaming, Esper/NEsper, Kafka, Scribe, and Flume—highlighting their primary features, programming languages, and project links for future technical research.

ApacheBig DataReal-Time

0 likes · 5 min read

Overview of Open-Source Real-Time Stream Processing Systems

Architecture Digest

Jun 9, 2016 · Databases

Understanding HBase Architecture and Core Principles

This article provides a comprehensive overview of HBase, covering its distributed architecture, component roles, data organization, read/write mechanisms, and best practices for schema and region design to ensure efficient big‑data storage and retrieval.

Big DataHBaseRegionServer

0 likes · 17 min read

Understanding HBase Architecture and Core Principles

dbaplus Community

Jun 7, 2016 · Big Data

What Is Big Data? Value, Platforms, and How to Harness Its Power

This article explains what big data is, where its value lies, how to design and build a big data platform, and the essential steps to turn massive data into actionable business insights while addressing technical and operational challenges.

BIBig DataData Value

0 likes · 16 min read

What Is Big Data? Value, Platforms, and How to Harness Its Power

360 Quality & Efficiency

Jun 6, 2016 · Big Data

Spark and MongoDB Tutorial: Daily Active User Statistics with Scala

This tutorial guides readers through using Apache Spark and MongoDB to compute daily active user statistics, covering Spark fundamentals, a Spark‑vs‑Hadoop comparison, MongoDB use cases, environment setup, Scala code workflow, Maven compilation, and job submission on a YARN cluster.

Big DataMongoDBScala

0 likes · 11 min read

Spark and MongoDB Tutorial: Daily Active User Statistics with Scala

Hulu Beijing

May 31, 2016 · Big Data

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Hadoop 3.0, built on JDK 1.8, adds erasure‑coded HDFS, multi‑NameNode support, native MapReduce task optimizations, cgroup‑based YARN memory and disk isolation, and container resizing, with an alpha slated for summer and a GA release expected in November or December.

Big DataHDFSHadoop

0 likes · 5 min read

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Ctrip Technology

May 30, 2016 · Big Data

How Big Data Drives Tourism Decision‑Making and Marketing – Insights from Ctrip CTO at the 2016 China Big Data Industry Summit

At the 2016 China Big Data Industry Summit, Ctrip's CTO explained how the company's massive travel data and predictive analytics improve tourism planning, marketing, and privacy protection, illustrating the practical value of big data for users, platforms, and suppliers.

Big DataCtripData Analytics

0 likes · 5 min read

How Big Data Drives Tourism Decision‑Making and Marketing – Insights from Ctrip CTO at the 2016 China Big Data Industry Summit

dbaplus Community

May 26, 2016 · Big Data

Mastering Apache Parquet: Columnar Storage, Nested Data, and Performance Gains

This article explains Apache Parquet’s columnar storage format, its support for nested data models, the underlying striping/assembly algorithm, file structure, push‑down optimizations, and performance advantages within the Hadoop ecosystem, providing a comprehensive guide for big‑data practitioners.

Apache ParquetBig DataHadoop

0 likes · 22 min read

Mastering Apache Parquet: Columnar Storage, Nested Data, and Performance Gains

Architect

May 25, 2016 · Big Data

How Flink Manages Memory to Overcome JVM Limitations

The article explains how Flink tackles JVM memory challenges by using proactive memory management, a custom serialization framework, cache‑friendly binary operations, and off‑heap memory techniques to reduce GC pressure, avoid OOM, and improve performance in big‑data workloads.

Big DataFlinkJVM

0 likes · 17 min read

How Flink Manages Memory to Overcome JVM Limitations

Architecture Digest

May 25, 2016 · Big Data

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

This article provides a comprehensive guide on tackling Spark performance bottlenecks by diagnosing data skew, locating the offending stages and operators, and applying a range of practical solutions—including Hive pre‑processing, key filtering, shuffle parallelism, two‑stage aggregation, map‑join, and combined strategies—followed by an in‑depth discussion of shuffle manager evolution and key configuration parameters for fine‑tuning.

Big DataData SkewShuffle Optimization

0 likes · 35 min read

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

Architect

May 22, 2016 · Big Data

Understanding Flink Execution Resources: Operator Chains, Task Slots, Slot Sharing and CoLocation

This article explains Flink's core execution‑resource concepts—including operator chaining, task slots, slot‑sharing groups and co‑location groups—detailing their conditions, API controls, internal implementation, and how they together maximize throughput and resource utilization in stream processing.

Big DataFlinkResource Management

0 likes · 11 min read

Understanding Flink Execution Resources: Operator Chains, Task Slots, Slot Sharing and CoLocation

MaGe Linux Operations

May 20, 2016 · Frontend Development

How Uber’s Data Visualization Team Turns Billions of GPS Points into Interactive Maps

This article explains how Uber built a full‑stack data‑visualization team that uses open‑source React and WebGL libraries to turn massive GPS streams into real‑time map analytics, public data stories, and reusable visual components for internal and external users.

Big DataData visualizationReact

0 likes · 8 min read

How Uber’s Data Visualization Team Turns Billions of GPS Points into Interactive Maps

58UXD

May 18, 2016 · Product Management

Why Companies Doubt User Research—and How to Make It Truly Valuable

This article examines why many enterprises view user research as ineffective, outlines the four biggest challenges—defining clear goals, cultivating insight, building capable teams, and adopting the right mindset—and offers practical strategies for making research results actionable, integrating them into product development, and evolving the role of user researchers.

Big DataUXagile

0 likes · 14 min read

Why Companies Doubt User Research—and How to Make It Truly Valuable

Ctrip Technology

May 16, 2016 · Artificial Intelligence

Machine Learning Applications in OTA Hotel Industry: From Data Challenges to Value Creation

This presentation details how Ctrip's hotel R&D team leverages machine learning and big data to address OTA-specific challenges, improve key service KPIs, evaluate project benefits, and deploy models through a robust pipeline and architecture, offering practical case studies and operational insights.

AIBig DataModel Evaluation

0 likes · 15 min read

Machine Learning Applications in OTA Hotel Industry: From Data Challenges to Value Creation

Meituan Technology Team

May 13, 2016 · Big Data

Spark Performance Optimization Guide: Data Skew and Shuffle Tuning

This advanced Spark performance guide explains how data skew arises during shuffles and presents eight practical solutions—including Hive preprocessing, key filtering, increased shuffle parallelism, two‑stage aggregation, map joins, sampling, random prefixes, and combined strategies—while also detailing key shuffle‑tuning parameters such as spark.shuffle.file.buffer, spark.reducer.maxSizeInFlight, and spark.shuffle.manager to improve memory usage and execution speed.

Big DataData SkewPerformance Optimization

0 likes · 33 min read

Spark Performance Optimization Guide: Data Skew and Shuffle Tuning

Qunar Tech Salon

May 13, 2016 · Big Data

Overview and Architecture of Hadoop Distributed File System (HDFS)

This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), detailing its design goals, architecture components such as NameNode, DataNode and SecondaryNameNode, data block handling, replication strategies, communication protocols, and the read, write, and delete processes.

Big DataDistributed File SystemHDFS

0 likes · 18 min read

Overview and Architecture of Hadoop Distributed File System (HDFS)

Efficient Ops

May 12, 2016 · Operations

How Big Data Powers Precise IT Operations for Modern Enterprises

This article explains what big data is, outlines its four V characteristics, and describes how precise IT operations—aligning services with business needs—leverage big data analytics to improve service quality, predict user behavior, and enhance competitiveness for both traditional and internet enterprises.

Big DataDigital TransformationIT Operations

0 likes · 15 min read

How Big Data Powers Precise IT Operations for Modern Enterprises

Architect

May 11, 2016 · Big Data

Comprehensive Guide to Hadoop MapReduce Job Execution, Scheduling, and Optimization

This article provides an in‑depth explanation of Hadoop MapReduce architecture, covering the roles of JobClient, JobTracker, TaskTracker and HDFS, the complete job lifecycle from submission to completion, scheduling strategies, shuffle and sort mechanisms, fault tolerance, and performance tuning techniques.

Big DataHadoopJobTracker

0 likes · 20 min read

Comprehensive Guide to Hadoop MapReduce Job Execution, Scheduling, and Optimization

Architecture Digest

May 7, 2016 · Fundamentals

Overview of Alibaba Open‑Source Projects and Tools

This article provides a comprehensive overview of numerous Alibaba open‑source projects, ranging from service frameworks like Dubbo and database tools such as Druid and OceanBase to front‑end libraries, distributed systems, testing platforms, and cloud utilities, each briefly described with links for further reference.

AlibabaBig DataJava

0 likes · 27 min read

Overview of Alibaba Open‑Source Projects and Tools

Architect

May 6, 2016 · Big Data

Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool

This article describes how the Youzan data team combined Apache Kylin, Mondrian, and Saiku into a three‑layer OLAP system, covering background, component overviews, technical architecture, schema integration challenges, count‑distinct handling, Kylin‑specific SQL quirks, and practical solutions.

Big DataHBaseHive

0 likes · 12 min read

Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool

Architect

May 4, 2016 · Big Data

Kafka Main Configuration Parameters – Broker, Producer, Consumer, and Topic Settings

This article provides a comprehensive overview of Kafka's core configuration options, detailing default values and descriptions for broker, producer, consumer, and topic‑level settings to help administrators fine‑tune performance, reliability, and resource usage.

Big DataBrokerConfiguration

0 likes · 23 min read

Kafka Main Configuration Parameters – Broker, Producer, Consumer, and Topic Settings

Baidu Intelligent Testing

May 4, 2016 · Big Data

Understanding Big Data: The Importance of Data Breadth and User Profiling for Precise Marketing and Product Optimization

The article explains the core concepts of big data, emphasizing data breadth across product lines, illustrates how comprehensive user profiling can drive personalized marketing and product improvements, and provides practical examples of cross‑product data analysis in e‑commerce, finance, travel, and gaming contexts.

Big Datacross‑product analysisdata breadth

0 likes · 5 min read

Understanding Big Data: The Importance of Data Breadth and User Profiling for Precise Marketing and Product Optimization

Architecture Digest

May 4, 2016 · Big Data

Upgrading Spark from 1.4.1 to 1.6.1: Memory, Storage, and Operational Challenges

The article details the author’s experience upgrading a production Spark cluster from version 1.4.1 to 1.6.1, exposing memory‑spill, unified memory, BlockManager deadlock, Yarn‑kill, UI quirks, and Spark‑SQL compatibility issues, and proposes concrete code‑level fixes for each problem.

Big DataMemory ManagementShuffle

0 likes · 14 min read

Upgrading Spark from 1.4.1 to 1.6.1: Memory, Storage, and Operational Challenges

Meituan Technology Team

Apr 29, 2016 · Big Data

Introduction to Spark in Big Data

Apache Spark, a versatile big‑data platform supporting batch processing, SQL queries, real‑time streaming, and machine‑learning workloads, dramatically accelerates data‑intensive jobs, as demonstrated by Meituan‑Dianping, where its high‑performance engine reduces execution times and enhances scalability across diverse analytical and operational pipelines.

Batch ProcessingBig DataSpark

0 likes · 1 min read

Architecture Digest

Apr 25, 2016 · Big Data

Curated Learning Resources for Spark and Scala Beginners

This article compiles a comprehensive list of tutorials, books, online courses, and tools to help beginners get started with Apache Spark and the Scala programming language, including setup instructions, code snippets, and links to free and paid learning materials.

Big DataLearning ResourcesScala

0 likes · 7 min read

Curated Learning Resources for Spark and Scala Beginners

ITPUB

Apr 24, 2016 · Big Data

12 Essential Hive Performance Tips for Faster Hadoop Queries

This guide presents twelve practical Hive tuning techniques—including avoiding MapReduce, limiting string concatenation, steering clear of subqueries, choosing the right file formats, managing vectorization, sizing containers, enabling statistics, and optimizing joins—to dramatically improve query speed on Hadoop.

Big DataHadoopHive

0 likes · 7 min read

12 Essential Hive Performance Tips for Faster Hadoop Queries

Big Data and Microservices

Apr 21, 2016 · Information Security

How Can Banks Secure Big Data? Key Strategies for Protecting Customer Information

In the era of big data, banks face unprecedented information security challenges due to massive, valuable, and highly damaging data breaches, and must adopt encryption, flexible access control, rigorous auditing, DLP solutions, strict data management, and robust outsourcing controls to safeguard customer information.

BankingBig DataDLP

0 likes · 10 min read

How Can Banks Secure Big Data? Key Strategies for Protecting Customer Information

Big Data and Microservices

Apr 19, 2016 · Industry Insights

Designing a Scalable Real‑Time Stock Prediction Architecture with Open‑Source Tools

This article outlines a reference architecture for a low‑latency, horizontally scalable real‑time stock prediction system built with open‑source components such as Spring Cloud Data Flow, Apache Geode, Spark MLlib, and Hadoop, and discusses data flow steps, simplified deployment, and algorithm choices for market forecasting.

Big DataReal-TimeStock Prediction

0 likes · 7 min read

Designing a Scalable Real‑Time Stock Prediction Architecture with Open‑Source Tools

Java High-Performance Architecture

Apr 18, 2016 · Big Data

Why Spark Is Outpacing Hadoop: Speed, Real‑Time Processing, and ML Advantages

The article explains how Spark has become the leading open‑source big‑data platform, highlighting its superior speed, in‑memory processing, real‑time streaming, and built‑in machine‑learning library compared with Hadoop’s slower, disk‑based MapReduce approach and reliance on external storage and ML tools.

Big DataHadoopReal-time Processing

0 likes · 5 min read

Why Spark Is Outpacing Hadoop: Speed, Real‑Time Processing, and ML Advantages

Architecture Digest

Apr 18, 2016 · Big Data

Introduction to Apache Spark: Architecture, RDD, Spark on YARN, and SparkSQL

This article introduces Apache Spark’s core architecture, explains how Spark runs on YARN, details driver and executor roles, describes RDD concepts and dependencies, and outlines SparkSQL’s schema‑based query processing, providing code examples for HiveContext and JDBC integration.

Big DataRDDSpark

0 likes · 14 min read

Introduction to Apache Spark: Architecture, RDD, Spark on YARN, and SparkSQL

Efficient Ops

Apr 17, 2016 · Operations

How CIOs Can Navigate Massive Technological and Industry Shifts

In this speech, former Chinese Ministry of Industry and Information Technology deputy minister Yang Xueshan outlines six strategic principles for CIOs—understanding major technological and industry trends, focusing on internal data, embracing fusion, connectivity, platforms, CPS, and intelligence, and taking practical, grounded actions to stay relevant.

Big DataCIODigital Transformation

0 likes · 18 min read

How CIOs Can Navigate Massive Technological and Industry Shifts

21CTO

Apr 16, 2016 · Databases

Optimizing HBase Log Queries: Index Design and RowKey Strategies

This article examines the challenges of storing and querying log data in HBase, outlines the drawbacks of custom indexing, and presents practical rowKey design, filter usage, and integration with external search engines to improve query performance.

Big DataHBaseNoSQL

0 likes · 15 min read

Optimizing HBase Log Queries: Index Design and RowKey Strategies

21CTO

Apr 14, 2016 · Big Data

How Meituan’s Data Architecture Powers Precise Mobile Marketing

This article details Meituan Dianping's data‑driven approach to precise marketing, describing the O2O marketing framework, a layered pyramid data system, profiling techniques, budget monitoring, and two real‑world case studies that together illustrate how big‑data technologies boost marketing efficiency on mobile platforms.

Big DataData Architecturemachine learning

0 likes · 12 min read

How Meituan’s Data Architecture Powers Precise Mobile Marketing

Efficient Ops

Apr 14, 2016 · Big Data

Why Big Data May Not Be the Gold Mine You Expect: Insights and Pitfalls

The article examines what big data really means, its core 4 V characteristics, current limitations in China, the overhyped value of data, the importance of business‑driven applications, and why starting from small, relevant data is essential for true predictive power.

Big DataBusiness IntelligenceData Value

0 likes · 13 min read

Why Big Data May Not Be the Gold Mine You Expect: Insights and Pitfalls

Architect

Apr 10, 2016 · Big Data

Introduction to Flume NG: Architecture, Components, Configuration, and Best Practices

This article provides a comprehensive overview of Flume NG, covering its architecture, core components (source, channel, sink), reliability mechanisms, common deployment scenarios, installation steps, configuration examples, compilation instructions, and practical best‑practice recommendations for building robust log‑collection pipelines.

ApacheBig DataConfiguration

0 likes · 16 min read

Introduction to Flume NG: Architecture, Components, Configuration, and Best Practices

Architecture Digest

Apr 9, 2016 · Big Data

Practical Experience of Using Spark at Meituan: Platformization, ETL Templates, Feature Platform, Data Mining, and Real‑World Applications

This article describes how Meituan migrated from Hive‑SQL and MapReduce to Spark on YARN, built an interactive Zeppelin‑based development platform, created reusable ETL templates, constructed a Spark‑driven feature and data‑mining platform, and applied Spark to interactive user‑behavior analysis and large‑scale SEM services, highlighting performance gains and operational benefits.

Big DataData PlatformETL

0 likes · 19 min read

Practical Experience of Using Spark at Meituan: Platformization, ETL Templates, Feature Platform, Data Mining, and Real‑World Applications

Big Data and Microservices

Apr 7, 2016 · Big Data

Turning Big Data into Actionable Security Visualizations: Process & Real‑World Cases

This article explains how to transform massive security‑related big data into clear visual insights, covering storytelling, data processing, visual encoding, design workflow, and two real‑world case studies that illustrate vulnerability mapping and internal traffic analysis for improved threat awareness.

Big DataData visualizationdesign process

0 likes · 10 min read

Turning Big Data into Actionable Security Visualizations: Process & Real‑World Cases

dbaplus Community

Apr 6, 2016 · Fundamentals

Essential Open‑Source Technologies Every Engineer Should Know

This article provides a comprehensive, curated overview of the most influential open‑source software across the full technology stack—including operating systems, web servers, programming languages, frameworks, databases, big‑data tools, and development utilities—offering practical insights for engineers seeking to understand and adopt proven solutions.

Big Datadatabasesopen source

0 likes · 24 min read

Essential Open‑Source Technologies Every Engineer Should Know

21CTO

Apr 4, 2016 · Big Data

How Asana Scaled Its Data Infrastructure: From MySQL to Redshift & Hadoop

This article details Asana's evolution from a simple Python‑MySQL setup to a robust, scalable data platform using Redshift, Hadoop, Luigi, and modern BI tools, highlighting challenges, solutions, and lessons learned for building reliable data pipelines in fast‑growing startups.

Big DataETLHadoop

0 likes · 15 min read

How Asana Scaled Its Data Infrastructure: From MySQL to Redshift & Hadoop

dbaplus Community

Apr 3, 2016 · Big Data

How Asana Scaled Its Data Infrastructure: From MySQL to Redshift & Beyond

Facing rapid growth, Asana overhauled its data infrastructure—from a single‑machine MySQL setup to a Redshift‑backed warehouse, Hadoop‑based log processing, Luigi orchestration, and self‑service BI tools—highlighting the challenges, solutions, and future plans for scalable, reliable analytics.

Big DataBusiness IntelligenceETL

0 likes · 16 min read

How Asana Scaled Its Data Infrastructure: From MySQL to Redshift & Beyond

Architect

Apr 3, 2016 · Big Data

Apache Flume NG Architecture, Core Concepts, and Practical Configuration Guide

This article introduces Apache Flume NG, a distributed and reliable log collection system, explains its core architecture components such as Event, Flow, Agent, Source, Channel, and Sink, and provides detailed configuration examples for various pipelines, including load‑balancing, failover, and integration with HDFS.

Apache FlumeBig DataConfiguration

0 likes · 12 min read

Apache Flume NG Architecture, Core Concepts, and Practical Configuration Guide

21CTO

Mar 31, 2016 · Big Data

Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets

Airbnb’s engineering team outlines the evolution of its big‑data platform, detailing the philosophy behind its architecture, the dual “gold” and “silver” Hive clusters, migration to Mesos, use of Presto, Airpal, Airflow, and the performance and cost gains achieved through these design choices.

AirbnbAirflowBig Data

0 likes · 11 min read

Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets

Art of Distributed System Architecture Design

Mar 31, 2016 · Big Data

Airbnb’s Big Data Platform Architecture: Design, Evolution, and Lessons Learned

Airbnb’s engineering team outlines the evolution and design of its massive big‑data platform—detailing the dual “gold” and “silver” Hive clusters, use of Kafka, Presto, Airflow, Mesos, and Spark, along with performance gains, cost reductions, and open‑source contributions.

AirbnbAirflowBig Data

0 likes · 13 min read

Airbnb’s Big Data Platform Architecture: Design, Evolution, and Lessons Learned

Big Data and Microservices

Mar 30, 2016 · Industry Insights

How Text Mining is Transforming the Securities Industry: Trends and Challenges

This article examines the rapid growth of structured and unstructured data in the securities sector, outlines text mining fundamentals, explores key algorithms and tools, and analyzes current industry services, investment communities, and professional solutions while highlighting existing challenges and future opportunities.

Big DataSentiment Analysisindustry insight

0 likes · 32 min read

How Text Mining is Transforming the Securities Industry: Trends and Challenges

Architect

Mar 29, 2016 · Big Data

Understanding Apache Storm Architecture, Stream Groupings, and the Acker Mechanism

This article provides a comprehensive overview of Apache Storm’s architecture, including the roles of Nimbus, Supervisor, and ZooKeeper, explains various stream groupings, details the Acker mechanism, and describes task execution, parallelism calculation, and internal data flow within the Storm cluster.

Apache StormBig DataReal-time analytics

0 likes · 19 min read

Understanding Apache Storm Architecture, Stream Groupings, and the Acker Mechanism

Architecture Digest

Mar 28, 2016 · Big Data

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

This article provides a comprehensive overview of Hadoop and its surrounding ecosystem, detailing core components, storage principles, key algorithms, and a wide range of modern big‑data technologies such as Spark, Flink, Kafka, NoSQL databases, and cloud‑based processing platforms.

Big DataHadoopKafka

0 likes · 11 min read

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

Big Data and Microservices

Mar 23, 2016 · Industry Insights

Inside the Securities Tech Revolution: Cloud, Microservices, and Big Data

The article examines the paradox of the Chinese securities industry—high demand for cutting‑edge trading, quantitative and high‑frequency systems versus outdated IT—while detailing the team’s FinTech startup approach, their Node.js/Docker/MongoDB stack, a cloud‑native trading platform, microservice architecture, big‑data pipelines, performance tuning, and DevOps practices.

Big DataDevOpsFinTech

0 likes · 21 min read

Inside the Securities Tech Revolution: Cloud, Microservices, and Big Data

ITPUB

Mar 19, 2016 · Big Data

Inside HDFS: How NameNode and DataNode Manage Big Data Writes and Reads

This article explains the fundamentals of distributed file systems, focusing on Hadoop’s HDFS architecture, the separation of metadata and data via NameNode and DataNode, and detailed step‑by‑step write and read processes, including replication, fault recovery, and block splitting across nodes.

Big DataDataNodeDistributed File System

0 likes · 8 min read

Inside HDFS: How NameNode and DataNode Manage Big Data Writes and Reads

21CTO

Mar 16, 2016 · Big Data

Inside Uber’s Tech: How Data, AI, and Cloud Power Ride‑Sharing in China

Uber’s CTO Thuan Pham revealed at a Chinese tech salon how the company’s global architecture, data‑center strategy, cloud partnership with Baidu, anti‑fraud machine‑learning models, map localization and big‑data analytics together enable a unified yet locally adapted ride‑sharing platform across China and the world.

Big DataTechnology ArchitectureUber

0 likes · 17 min read

Inside Uber’s Tech: How Data, AI, and Cloud Power Ride‑Sharing in China

Architect

Mar 10, 2016 · Big Data

Analysis and Practice of a Real-Time Hadoop Data Security Solution

The article presents a detailed technical overview of Apache Eagle's real-time Hadoop data security architecture, covering distributed data collection, stream processing, metadata‑driven policy enforcement, machine‑learning‑based anomaly detection, and integration with Hadoop ecosystem components such as HBase, Kafka, and Storm.

Apache EagleBig DataHadoop

0 likes · 25 min read

Analysis and Practice of a Real-Time Hadoop Data Security Solution

Architect

Mar 8, 2016 · Big Data

Kafka Benchmark: Producer and Consumer Throughput, Replication, Message Size, and Latency Analysis

This article presents a comprehensive Kafka benchmark using six machines to evaluate producer and consumer throughput, replication effects, message size impact, and end‑to‑end latency, providing detailed results, analysis, and reproducible test commands.

BenchmarkBig DataKafka

0 likes · 12 min read

Kafka Benchmark: Producer and Consumer Throughput, Replication, Message Size, and Latency Analysis

Architect

Mar 8, 2016 · Big Data

In‑Depth Analysis of Apache Kafka: Architecture, Core Concepts, and Benchmark

This article provides a comprehensive technical overview of Apache Kafka, covering its architecture, core concepts, design goals, comparison with other message queues, replication, consumer groups, delivery guarantees, and performance benchmarking, making it a valuable resource for big‑data engineers.

Big DataKafkaReplication

0 likes · 30 min read

Architect

Mar 6, 2016 · Big Data

Clustering Geolocated User Events with DBSCAN and Spark

This article explains how to apply the DBSCAN clustering algorithm to geolocated user event data and leverage Apache Spark’s distributed processing with PairRDDs to efficiently identify frequent user regions, detect outliers, and build location‑based services such as personalized recommendations and security alerts.

Big DataDBSCANSpark

0 likes · 8 min read

Clustering Geolocated User Events with DBSCAN and Spark

ITPUB

Feb 24, 2016 · Big Data

How Pepperdata Optimizes Hadoop Cluster Resources and Improves Performance

The article explains how Hadoop clusters suffer from resource contention among multiple users, why YARN alone often fails to prioritize workloads, and how Pepperdata provides deeper visibility and automatic adjustments that reduce low‑priority usage, cut node count, and lower cloud costs.

Big DataCluster ManagementHadoop

0 likes · 7 min read

How Pepperdata Optimizes Hadoop Cluster Resources and Improves Performance

Architecture Digest

Feb 23, 2016 · Databases

Highlights from SDCC 2015 Database Practice Forum: Distributed Database Technologies and Real-World Implementations

The article reviews eight expert presentations from the 2015 SDCC Database Practice Forum, covering distributed database architectures, performance tuning, high‑availability solutions, and practical case studies from leading Chinese internet companies.

Big DataNoSQLPerformance Optimization

0 likes · 9 min read

Highlights from SDCC 2015 Database Practice Forum: Distributed Database Technologies and Real-World Implementations

21CTO

Feb 23, 2016 · Big Data

Why Kafka Dominates Modern Data Pipelines: Architecture, Benefits, and Guarantees

Kafka, the open‑source distributed messaging system from LinkedIn, offers O(1) persistence, high throughput, partitioned topics, and flexible delivery guarantees, making it a cornerstone for modern big‑data pipelines and real‑time processing alongside Hadoop, Spark, and Storm.

Big DataConsumerDelivery Guarantees

0 likes · 21 min read

Why Kafka Dominates Modern Data Pipelines: Architecture, Benefits, and Guarantees

Architecture Digest

Feb 22, 2016 · Big Data

Building High‑Performance Big Data Analytics Systems: Techniques and Best Practices

An in‑depth guide outlines technology‑agnostic best‑practice techniques for building high‑performance big data analytics systems, covering data acquisition, storage, processing, visualization, and security, and explains how to address the five V’s of big data to meet demanding operational and performance requirements.

AnalyticsBig Datadata engineering

0 likes · 20 min read

Building High‑Performance Big Data Analytics Systems: Techniques and Best Practices

ITPUB

Feb 20, 2016 · Big Data

Doug Cutting’s Journey: How Hadoop Shaped the Big Data Era

The article chronicles Doug Cutting’s path from his Stanford studies and early Xerox work through the creation of Lucene, Nutch, and Hadoop, highlighting how open‑source innovations and Google’s technologies propelled Hadoop to become a cornerstone of modern big‑data processing and its future outlook.

Big DataDoug CuttingHadoop

0 likes · 15 min read

Doug Cutting’s Journey: How Hadoop Shaped the Big Data Era

21CTO

Feb 14, 2016 · Big Data

How PageRank Works: From Random Surfer Theory to MapReduce Implementation

This article explains the fundamental principles of Google's PageRank algorithm, modeling web pages as a directed graph and a random surfer, discusses matrix formulation, convergence issues like dangling nodes and traps, and demonstrates a practical MapReduce implementation with Python code for large‑scale rank computation.

Big DataMapReducePageRank

0 likes · 15 min read

How PageRank Works: From Random Surfer Theory to MapReduce Implementation

Qunar Tech Salon

Feb 14, 2016 · Big Data

Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

This article explains how Alibaba's Jushita platform leverages Apache Solr with a wide‑table data model and a custom QParser plugin to achieve real‑time, multi‑dimensional buyer filtering that traditional relational databases cannot handle efficiently in big‑data scenarios.

Big DataReal-time QuerySolr

0 likes · 10 min read

Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

Alibaba Cloud Infrastructure

Feb 14, 2016 · Big Data

Small Data vs. Big Data: How Minor Signals Guide Robust Data Management

The article explains why small data are essential for avoiding common big‑data mining traps, illustrates pitfalls through real‑world examples, and offers practical methods—incremental improvement, analogical reasoning, and simple modeling—to harness weak signals for more reliable decision‑making.

Bayes theoremBig Datacausality

0 likes · 11 min read

Small Data vs. Big Data: How Minor Signals Guide Robust Data Management

21CTO

Feb 1, 2016 · Big Data

How Solr Supercharges Real‑Time Queries in Big Data Environments

This article examines a real‑world case from Alibaba’s Taobao Jushita platform, showing how traditional SQL queries struggle with multi‑dimensional, high‑volume data and how integrating Solr’s inverted‑index search engine—combined with Hive‑generated wide tables and custom QParser plugins—delivers millisecond‑level, scalable query performance for buyer analytics.

Big DataHiveReal-time Query

0 likes · 11 min read

How Solr Supercharges Real‑Time Queries in Big Data Environments

21CTO

Jan 25, 2016 · Big Data

How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale

Pora (Personal Offline Realtime Analyze) is a high‑throughput, low‑latency platform that captures user behavior in real time, enabling Alibaba’s search engine to deliver personalized results, support online learning, and run 24/7 with massive data volumes.

AlibabaBig DataPora

0 likes · 6 min read

How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale

Java High-Performance Architecture

Jan 24, 2016 · Big Data

MapReduce Explained: From Library Book Counting to Word Count in Big Data

This article introduces the MapReduce parallel processing model, illustrates its core map and reduce operations with a library‑shelf analogy and a classic word‑count example, and walks through each processing stage using clear diagrams to show how massive data is aggregated efficiently.

Big DataHadoopMapReduce

0 likes · 5 min read

MapReduce Explained: From Library Book Counting to Word Count in Big Data

21CTO

Jan 23, 2016 · Big Data

How Massive Is the Data Behind the World’s Biggest Porn Sites?

The article analyzes the staggering traffic, storage needs, and infrastructure of major adult video platforms, revealing that sites like Xvideos and YouPorn handle tens of petabytes of data monthly, requiring bandwidth and hardware comparable to leading streaming services.

Big Datacloud storagepornography

0 likes · 8 min read

How Massive Is the Data Behind the World’s Biggest Porn Sites?

Qunar Tech Salon

Jan 20, 2016 · Cloud Computing

Technical Architecture of Alipay and Ant Huabei for Large-Scale Promotional Events

The article explains how Alipay's multi-layered cloud architecture, logical data center design, distributed data strategies, and flexible transaction framework enable high availability, horizontal scalability, and rapid deployment for massive promotional traffic such as Double‑11, illustrated with the Ant Huabei case study.

AlipayBig DataDistributed Systems

0 likes · 21 min read

Technical Architecture of Alipay and Ant Huabei for Large-Scale Promotional Events

ITPUB

Jan 20, 2016 · Big Data

How Meizu Built an Agile Big Data Platform for Millions of Users

The Meizu Tech Open Day showcased the company's rapid evolution to a data‑driven mobile internet firm, detailing its DW1.0 and DW2.0 data‑warehouse architectures, recommendation pipelines, Spark adoption, and ELK‑based log analytics, while sharing practical lessons and future challenges.

Big DataData ArchitectureData Warehouse

0 likes · 11 min read

How Meizu Built an Agile Big Data Platform for Millions of Users

Java High-Performance Architecture

Jan 11, 2016 · Big Data

How HDFS Powers Scalable, Reliable Storage in Big Data Environments

This article explains how HDFS abstracts multiple servers into a single file system, splits files into replicated blocks, manages metadata via NameNode and DataNode, and provides linear capacity scaling and high reliability for big data workloads.

Big DataDistributed File SystemHDFS

0 likes · 5 min read

How HDFS Powers Scalable, Reliable Storage in Big Data Environments

Qunar Tech Salon

Jan 11, 2016 · Big Data

Architecture of Taobao's Massive Data Products: From Data Sources to the Glider Middleware

The article details Taobao's massive data product architecture, describing a five‑layer system that processes billions of daily records using Hadoop, real‑time streams, distributed MySQL and HBase clusters, and a middleware layer called Glider that unifies queries, caching, and front‑end integration.

Big DataData ArchitectureDistributed Systems

0 likes · 16 min read

Architecture of Taobao's Massive Data Products: From Data Sources to the Glider Middleware

Baidu Maps Tech Team

Jan 6, 2016 · Big Data

How Baidu Maps Scales Billion‑Row OLAP Queries with Apache Kylin

Baidu Maps’ Data Intelligence team built a large‑scale OLAP platform using Apache Kylin, detailing the challenges of multi‑dimensional analysis on billions of rows, the architecture, custom extensions for task, resource, and monitoring management, and performance optimizations that achieve millisecond‑level SQL responses.

Apache KylinBig DataData Warehouse

0 likes · 21 min read

How Baidu Maps Scales Billion‑Row OLAP Queries with Apache Kylin

21CTO

Jan 6, 2016 · Big Data

How Taobao Scales Massive Data Products: Architecture Behind 1.5PB Daily Processing

This article explains how Taobao processes over 1.5 PB of daily data through a five‑layer architecture, combining batch Hadoop jobs, a streaming platform, distributed MySQL and HBase storage, and a unified caching middle layer to deliver fast, scalable data services.

Big Datacaching

0 likes · 15 min read

How Taobao Scales Massive Data Products: Architecture Behind 1.5PB Daily Processing

Efficient Ops

Jan 5, 2016 · Information Security

How Apache Eagle Secures Hadoop: Real‑Time Big Data Threat Detection

Apache Eagle is an open‑source, distributed, real‑time security monitoring platform for Hadoop that combines stream‑processing, scalable policy enforcement, and machine‑learning user profiling to protect massive data assets across eBay’s production clusters.

Apache EagleBig DataHadoop

0 likes · 19 min read

How Apache Eagle Secures Hadoop: Real‑Time Big Data Threat Detection

Architect

Jan 5, 2016 · Big Data

Apache Eagle: eBay’s Open‑Source Real‑Time Hadoop Data Security Platform

The article provides a comprehensive technical overview of Apache Eagle, an open‑source, distributed, real‑time security monitoring and alerting platform for Hadoop developed by eBay, covering its motivation, architecture, core components, machine‑learning based detection, typical use cases, and future development directions.

Apache EagleBig DataHadoop

0 likes · 15 min read

Apache Eagle: eBay’s Open‑Source Real‑Time Hadoop Data Security Platform

21CTO

Jan 3, 2016 · Artificial Intelligence

How to Build a Real-Time Stock Prediction System with Open-Source AI and Big Data Tools

An open-source reference architecture for real-time stock prediction is presented, detailing a scalable, low-latency pipeline that captures live market data, stores it in memory, trains and applies machine learning models using Spring Cloud Data Flow, Apache Geode, Spark MLlib, and related big‑data components.

Big DataSpark MLlibSpring Cloud Data Flow

0 likes · 8 min read

How to Build a Real-Time Stock Prediction System with Open-Source AI and Big Data Tools

Architect

Dec 31, 2015 · Big Data

Using Spark for Machine Learning, New Word Discovery, and Intelligent Q&A

The article explains how to leverage Apache Spark for machine‑learning tasks, large‑scale new‑word discovery, and simple intelligent question‑answering by using Spark‑Shell, Scala code, and word2vec‑based similarity, while sharing practical tips and performance considerations.

Big DataIntelligent QANew Word Discovery

0 likes · 15 min read

Using Spark for Machine Learning, New Word Discovery, and Intelligent Q&A

Architect

Dec 30, 2015 · Big Data

Real-Time Big Data Processing with Storm and Kafka on Alibaba Cloud

This article explains how to build a large‑scale, real‑time vehicle monitoring system using Apache Storm and Kafka on Alibaba Cloud, covering the challenges of big‑data ingestion, system architecture, deployment steps, performance testing, and practical lessons learned.

Alibaba CloudBig DataKafka

0 likes · 12 min read

Real-Time Big Data Processing with Storm and Kafka on Alibaba Cloud

Architects Research Society

Dec 30, 2015 · Artificial Intelligence

IBM Watson Personality Insights: How AI Analyzes Social Media Language to Infer Traits

The article explains how IBM's Watson uses AI and big‑data techniques to examine the words people write on platforms like Twitter and Facebook, extracting personality traits such as openness and neuroticism, and discusses the potential business uses and privacy concerns of this technology.

AIBig DataPersonality Analysis

0 likes · 9 min read

IBM Watson Personality Insights: How AI Analyzes Social Media Language to Infer Traits

ITPUB

Dec 29, 2015 · Big Data

How SparkSQL Executes Queries Faster Than Hive: A Deep Dive

This article explains SparkSQL's query processing pipeline—from parsing and logical planning through optimization and physical execution—highlighting why it often outperforms Hive on MapReduce by reducing I/O, minimizing shuffle stages, and reusing JVMs.

Big DataHiveSparkSQL

0 likes · 13 min read

How SparkSQL Executes Queries Faster Than Hive: A Deep Dive

21CTO

Dec 22, 2015 · Big Data

How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting

This article explains how to design and implement a distributed web‑crawling framework in Java that can collect, structure, and store massive amounts of data while handling anti‑scraping measures, duplicate detection, and real‑time monitoring.

Big DataData ExtractionJava

0 likes · 11 min read

How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting

21CTO

Dec 21, 2015 · Information Security

Why Open Source Is Becoming the Top Choice for Enterprise Security and Innovation

Over the past decade, open‑source software has surged in the enterprise sector, driven by startups and venture capital, with surveys showing widespread adoption, increased contributions, and strong security advantages that are reshaping IT architecture, cloud, and big‑data strategies.

Big DataEnterprise Softwarecloud computing

0 likes · 4 min read

Why Open Source Is Becoming the Top Choice for Enterprise Security and Innovation

Architect

Dec 18, 2015 · Big Data

Understanding Apache Kafka’s High‑Throughput Architecture and Performance Optimizations

This article explains Apache Kafka’s core concepts, high‑throughput design choices such as sequential I/O, PageCache, Sendfile, and partitioning, and provides practical performance tips and configuration recommendations for brokers, producers, and consumers in large‑scale data pipelines.

Big DataConsumerDistributed Messaging

0 likes · 16 min read

Understanding Apache Kafka’s High‑Throughput Architecture and Performance Optimizations

21CTO

Dec 7, 2015 · Information Security

How Tencent Combats Fraudsters with Big Data and AI‑Powered Risk Engines

This article explains how Tencent uses big‑data collection, user profiling, and AI‑driven risk learning engines to detect and block malicious accounts, proxy IPs, and fraudulent activities across e‑commerce and other platforms, detailing the architecture, algorithms, and practical defenses employed.

Big Dataanti-fraudfraud detection

0 likes · 14 min read

How Tencent Combats Fraudsters with Big Data and AI‑Powered Risk Engines

21CTO

Dec 7, 2015 · Operations

Inside JD.com’s ‘Qinglong’ Logistics Engine: Architecture, AI, and O2O Innovations

This article dissects JD.com’s Qinglong logistics system, detailing its O2O strategy, big‑data‑driven pre‑sorting, AI algorithms, GIS integration, and the evolution from version 1.0 to 3.0, highlighting how these technologies enable ultra‑fast, agile supply‑chain operations.

AIBig DataJD.com

0 likes · 12 min read

Inside JD.com’s ‘Qinglong’ Logistics Engine: Architecture, AI, and O2O Innovations

Architects Research Society

Dec 3, 2015 · Artificial Intelligence

IBM Donates SystemML to Apache Incubator, Joining the Open‑Source Machine Learning Wave

IBM announced that its SystemML machine‑learning platform will become an Apache Incubator project, highlighting a broader industry trend where tech giants like Google and Facebook open‑source their AI tools to accelerate data‑driven innovation and expand enterprise‑focused machine‑learning ecosystems.

Apache SystemMLBig DataIBM

0 likes · 5 min read

IBM Donates SystemML to Apache Incubator, Joining the Open‑Source Machine Learning Wave

ITPUB

Dec 3, 2015 · Databases

Choosing the Right Time‑Series Database: Types, Queries, and Performance Trade‑offs

Time‑series data, defined by a timestamp field, appears everywhere, and the article explains how to choose an appropriate time‑series database by comparing two schema models, their query patterns, performance trade‑offs, and why modern solutions like Elasticsearch, columnar stores, and Druid excel at real‑time massive aggregation.

Big DataElasticsearchSQL

0 likes · 9 min read

Choosing the Right Time‑Series Database: Types, Queries, and Performance Trade‑offs