Tagged articles
3672 articles
Page 37 of 37
Architect
Architect
Dec 2, 2015 · Big Data

Designing an Agile Data Warehouse Architecture for Internet Companies

The article outlines a practical, end‑to‑end data platform architecture for internet businesses, covering data collection, storage and analysis, sharing, real‑time processing, task scheduling, and the importance of simplicity and agility in building an agile data warehouse.

Big DataData ArchitectureData Warehouse
0 likes · 10 min read
Designing an Agile Data Warehouse Architecture for Internet Companies
21CTO
21CTO
Dec 1, 2015 · Big Data

How to Build a Real‑Time Price Update System for Billion‑Item E‑Commerce

This article explains the design of a distributed, real‑time price‑update service that handles massive product data, combines query‑driven crawling, observer‑pattern notifications, and multiple data sources to keep e‑commerce price and inventory information fresh within minutes.

Big DataObserver Patterndistributed architecture
0 likes · 14 min read
How to Build a Real‑Time Price Update System for Billion‑Item E‑Commerce

LinkedIn’s Kafka at Scale: Architecture, Optimizations, and Operational Practices

The article details how LinkedIn has scaled Kafka from handling billions to trillions of messages daily, describing quota enforcement, a ZooKeeper‑free consumer, reliability enhancements, security plans, monitoring frameworks, fault‑injection testing, cluster balancing, and integration with other internal data systems.

Big DataKafkaLinkedIn
0 likes · 12 min read
LinkedIn’s Kafka at Scale: Architecture, Optimizations, and Operational Practices
Efficient Ops
Efficient Ops
Nov 29, 2015 · Big Data

Memory Computing vs Big Data: Trends, Platforms, and Architecture Choices

This article summarizes a WeChat group Q&A on the current momentum of in‑memory computing, compares TimesTen and SAP HANA, and offers practical advice on building enterprise big‑data platforms, covering cloud vs self‑build, talent, investment, and real‑world case studies.

Big Dataarchitecturein-memory databases
0 likes · 11 min read
Memory Computing vs Big Data: Trends, Platforms, and Architecture Choices
21CTO
21CTO
Nov 27, 2015 · Fundamentals

What Tech Stack Powers the Most Successful Startups? Insights from AngelList Data

A recent study analyzes startup technology choices, revealing the most popular programming languages, frontend frameworks, databases, mobile platforms, infrastructure services, DevOps tools, search technologies, API integrations, and advanced big‑data solutions across different performance tiers.

Big Datafrontendprogramming languages
0 likes · 5 min read
What Tech Stack Powers the Most Successful Startups? Insights from AngelList Data
Efficient Ops
Efficient Ops
Nov 26, 2015 · Big Data

Expert Insights on User Profiling and Stream Processing in Big Data

This article presents expert Q&A on effective user behavior analysis techniques for building detailed user profiles and compares mainstream stream‑processing solutions, outlining key factors such as latency, throughput, parallelism, and fault tolerance for selecting the right real‑time data platform.

Big Datastream processinguser profiling
0 likes · 11 min read
Expert Insights on User Profiling and Stream Processing in Big Data
21CTO
21CTO
Nov 26, 2015 · Big Data

How Taobao Scales Massive Data Products: Architecture Insights from Data Cube

This article explores Taobao's massive data product architecture, detailing its five-layer design, the use of Hadoop and real‑time systems, hybrid relational and NoSQL storage, a middleware layer for data integration, and systematic caching strategies that enable petabyte‑scale analytics and fast query responses.

Big Datacachingstorage
0 likes · 16 min read
How Taobao Scales Massive Data Products: Architecture Insights from Data Cube
21CTO
21CTO
Nov 23, 2015 · Big Data

How Dianping Scales Real‑Time Analytics with Apache Storm

This article explains how Dianping built a millisecond‑level real‑time computation platform using Apache Storm, covering use cases, system architecture, core Storm concepts, performance tuning, best practices, and a detailed Q&A on their production deployment.

Apache StormBig DataReal-time analytics
0 likes · 23 min read
How Dianping Scales Real‑Time Analytics with Apache Storm
21CTO
21CTO
Nov 19, 2015 · Big Data

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

This article surveys the evolution of Hadoop and its ecosystem, explains core storage and processing concepts, and introduces contemporary big‑data technologies such as Spark, Flink, Kafka, Lambda architecture, NoSQL databases, and cloud‑native solutions, highlighting their roles and trade‑offs.

Big DataFlinkHadoop
0 likes · 17 min read
Beyond Hadoop: Modern Big Data Platforms and Technologies Explained
Architect
Architect
Nov 19, 2015 · Cloud Computing

Alibaba Cloud Enterprise Architecture Behind Double 11: A Deep Dive into Scalable Cloud Computing

The article details how Alibaba Cloud's multi‑layered enterprise architecture, built on service‑oriented frameworks, distributed databases, and message queues, enabled record‑breaking Double 11 transactions while offering linear performance scaling, high reliability, and cost‑effective operations for large‑scale internet applications.

Alibaba CloudBig Dataenterprise architecture
0 likes · 8 min read
Alibaba Cloud Enterprise Architecture Behind Double 11: A Deep Dive into Scalable Cloud Computing
21CTO
21CTO
Nov 13, 2015 · Artificial Intelligence

7 Essential Python Tools Every Data Scientist Should Master

Aspiring data specialists should cultivate curiosity and hands‑on experience with production‑grade tools, and this guide highlights seven indispensable Python libraries—IPython, GraphLab Create, pandas, PuLP, matplotlib, scikit‑learn, and Spark—each explained with key features to boost your data‑science career.

Big DataData SciencePython
0 likes · 9 min read
7 Essential Python Tools Every Data Scientist Should Master
Architect
Architect
Nov 9, 2015 · Big Data

Modeling User Relationships and Information Propagation on Weibo

The article presents a comprehensive analysis of Weibo's social graph, introducing metrics such as propagation power, intimacy, fan and follow similarity, two‑degree relationships, and relationship circles to model and quantify user interactions and information diffusion within the platform.

Big DataSocial networkUser Relationship
0 likes · 13 min read
Modeling User Relationships and Information Propagation on Weibo
21CTO
21CTO
Nov 4, 2015 · Big Data

How We Built a Real‑Time Log Analytics Platform with Storm and Cardinality Counting

To monitor hundreds of web apps on UAE’s PaaS platform in near‑real time, we combined Storm with lightweight log transport, a memcached‑based fqueue, and adaptive cardinality counting to efficiently compute PV, UV, response times, and custom metrics while handling cross‑cluster log aggregation.

Big DataCardinality countingLog Processing
0 likes · 9 min read
How We Built a Real‑Time Log Analytics Platform with Storm and Cardinality Counting
21CTO
21CTO
Oct 26, 2015 · Big Data

Why the Internet May Fade: The Rise of the Internet of Things

The article explores Eric Schmidt's bold claim that the traditional Internet will disappear, outlines how the Internet of Things is poised to dominate with massive market potential, highlights major tech companies' IoT strategies, compares IoT with the Internet, and details the key technologies driving this new ecosystem.

Big DataIoTinternet of things
0 likes · 11 min read
Why the Internet May Fade: The Rise of the Internet of Things
Architect
Architect
Oct 17, 2015 · Big Data

Designing an Agile Data Warehouse and Data Platform for Internet Companies

The article outlines the purposes, architecture, data ingestion, storage, analysis, sharing, application, real‑time processing, scheduling, monitoring, and best‑practice recommendations for building a fast, flexible, and reliable big‑data platform in the fast‑changing internet industry.

Big DataData WarehouseHadoop
0 likes · 12 min read
Designing an Agile Data Warehouse and Data Platform for Internet Companies
Qunar Tech Salon
Qunar Tech Salon
Oct 16, 2015 · Databases

Choosing the Right NoSQL Database: MongoDB, Cassandra, and HBase Compared

The article examines why enterprises should consider NoSQL over Hadoop for big data storage, compares the three leading NoSQL databases—MongoDB, Cassandra, and HBase—based on market popularity, technical strengths, scalability, and use‑case suitability, and concludes with guidance on selecting the most appropriate solution.

Big DataMongoDBNoSQL
0 likes · 11 min read
Choosing the Right NoSQL Database: MongoDB, Cassandra, and HBase Compared

Understanding Storm: A Distributed Real-Time Computation System

The article explains the need for low‑latency, high‑performance, distributed real‑time processing, outlines the challenges such systems must address, and introduces Storm as a Hadoop‑like framework for stream processing, detailing its architecture, fault‑tolerance mechanisms, transactional topology, and large‑scale deployment at Taobao.

Big DataDistributed SystemsReal-time Processing
0 likes · 14 min read
Understanding Storm: A Distributed Real-Time Computation System
21CTO
21CTO
Sep 24, 2015 · Big Data

Comparing Apache Storm, Spark, and Samza: Which Real‑Time Stream Processor Fits Your Needs?

Apache Storm, Spark Streaming, and Samza are three open‑source, low‑latency, scalable distributed systems for real‑time data processing; this article outlines their architectures, key concepts, differences in data handling, state management, delivery guarantees, and typical use‑cases to help you choose the right framework.

Apache SamzaApache StormBig Data
0 likes · 7 min read
Comparing Apache Storm, Spark, and Samza: Which Real‑Time Stream Processor Fits Your Needs?

Comparative Overview of Apache Storm, Spark Streaming, and Samza for Real-Time Data Processing

This article introduces Apache Storm, Spark Streaming, and Samza, explains their architectures, common features, key differences such as delivery guarantees and state management, and provides guidance on selecting the most suitable framework for various real‑time big‑data use cases.

Apache StormBig DataComparison
0 likes · 8 min read
Comparative Overview of Apache Storm, Spark Streaming, and Samza for Real-Time Data Processing
21CTO
21CTO
Sep 19, 2015 · Artificial Intelligence

Why Distributed Machine Learning Needs More Data Than Speed

The article explains how distributed machine learning evolved from parallel computing to handle massive, long‑tail data sets, discusses the importance of scalability, fault recovery, and data‑parallel algorithms, and reviews frameworks such as MPI, MapReduce, and Pregel for building large‑scale AI systems.

Big DataData ParallelismLDA
0 likes · 24 min read
Why Distributed Machine Learning Needs More Data Than Speed
21CTO
21CTO
Sep 14, 2015 · Backend Development

Why Simple‑Looking Sites Like Taobao Need Hundreds of Top Engineers

Although sites like Taobao appear simple to users, they rely on massive distributed search, caching, storage, load‑balancing, CDN, logging, and data‑analysis systems that demand sophisticated backend engineering, massive infrastructure, and specialized algorithms, explaining why countless top engineers are required to keep them running.

Big DataDistributed Systemscaching
0 likes · 12 min read
Why Simple‑Looking Sites Like Taobao Need Hundreds of Top Engineers
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Aug 20, 2015 · Industry Insights

Which Ten Keywords Will Define Enterprise Software Architecture Over the Next Decade?

The article distills ten pivotal keywords—Industrial 4.0, Internet+, BFV, microservices, distributed systems, big data, multi‑screen fusion, Docker, OpenStack, and large‑platform micro‑apps—explaining how each shapes the evolution of enterprise software architecture and what challenges and opportunities they bring.

Big DataMicroservicescloud computing
0 likes · 11 min read
Which Ten Keywords Will Define Enterprise Software Architecture Over the Next Decade?
Qunar Tech Salon
Qunar Tech Salon
Aug 18, 2015 · Big Data

Overview of Spark Big Data Analytics Framework Components

Spark’s big‑data analytics ecosystem comprises core components such as the in‑memory RDD data structure, Streaming for real‑time processing, GraphX for graph analytics, MLlib for machine‑learning, Spark SQL for querying, the Tachyon file system, and SparkR, each enabling scalable, distributed computation.

Big DataGraphXMLlib
0 likes · 5 min read
Overview of Spark Big Data Analytics Framework Components
21CTO
21CTO
Aug 14, 2015 · Frontend Development

From AJAX to Node: A Journey Through Modern Web Development

Tracing the evolution of web technologies—from early AJAX challenges and jQuery’s rise, through Chrome’s dominance, GitHub’s impact, OAuth, JSON, and modern frameworks like Node.js and Bootstrap—the article reflects on how these tools reshaped frontend development and the broader software landscape.

Big DataNode.jsWeb Development
0 likes · 14 min read
From AJAX to Node: A Journey Through Modern Web Development
Hulu Beijing
Hulu Beijing
Aug 14, 2015 · Big Data

How Voidbox Bridges Docker and YARN for Scalable Big Data Workloads

Voidbox integrates Docker containers with YARN to simplify distributed application development, improve deployment, boost cluster efficiency, and provide fault‑tolerant, DAG‑based execution modes, enabling seamless resource management for Hadoop‑based big data jobs.

Big DataCluster ComputingDAG
0 likes · 17 min read
How Voidbox Bridges Docker and YARN for Scalable Big Data Workloads
Efficient Ops
Efficient Ops
Jul 28, 2015 · Operations

How Tencent’s BlueKing Automates Fault Recovery and Zero‑Touch Game Server Launch

This article explains how Tencent Game's BlueKing platform redesigns operations by building open‑source PaaS capabilities, automating fault self‑healing, enabling fully automated game server region launches, supporting self‑service change releases, leveraging big‑data for real‑time decisions, and moving toward open‑source and hybrid‑cloud solutions.

AutomationBig Datafault-recovery
0 likes · 19 min read
How Tencent’s BlueKing Automates Fault Recovery and Zero‑Touch Game Server Launch

Selection and Comparison of Big Data Benchmark Standards with a Focus on TPC‑DS

This article reviews the evolution of big‑data management technologies, discusses the criteria for choosing appropriate big‑data benchmarks, compares existing benchmarks such as MapReduce tests, YCSB, BigBench and BigFrame, and provides an in‑depth analysis of the TPC‑DS benchmark and its certification status.

BenchmarkBig DataData Management
0 likes · 15 min read
Selection and Comparison of Big Data Benchmark Standards with a Focus on TPC‑DS
Architect
Architect
Jul 18, 2015 · Databases

Qihoo 360’s Use of MongoDB: Architecture, Practices, and Lessons Learned

The article details how Qihoo 360 adopted MongoDB since 2011, scaling to over 100 applications, 1,500 instances and 20 billion daily queries, and shares their architectural choices, backup strategies, best‑practice recommendations, and advice for teams considering MongoDB in large‑scale, cloud‑native environments.

Backup StrategiesBig DataDatabase Architecture
0 likes · 12 min read
Qihoo 360’s Use of MongoDB: Architecture, Practices, and Lessons Learned
Qunar Tech Salon
Qunar Tech Salon
Jul 12, 2015 · Big Data

Airbnb OpenAir Conference: Open‑Source Tools Airpal, Aerosolve, and Airflow

At Airbnb’s inaugural OpenAir conference, the company unveiled three open‑source big‑data tools—Airpal, a Presto‑based visual SQL query engine; Aerosolve, an interpretable machine‑learning engine for pricing recommendations; and Airflow, an internal platform for orchestrating and monitoring data pipelines.

AirbnbBig DataOpenAir
0 likes · 4 min read
Airbnb OpenAir Conference: Open‑Source Tools Airpal, Aerosolve, and Airflow
Qunar Tech Salon
Qunar Tech Salon
Jul 8, 2015 · Big Data

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

This article explains how logs—simple, append‑only, time‑ordered records—serve as the core abstraction behind databases, distributed systems, data integration pipelines, and modern stream‑processing platforms such as Kafka and Hadoop, illustrating their design, scalability, and practical challenges.

Big DataData IntegrationDistributed Systems
0 likes · 45 min read
Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing
Model Perspective
Model Perspective
Jul 6, 2015 · Big Data

What Will Future Schools Look Like? Insights from Global Education Leaders

Amid heated debate over China’s Hengshui model, educators worldwide are envisioning future schools that leverage big-data analytics, immersive technology, and flexible, student-centered learning to cultivate critical thinking, creativity, and empathy, moving beyond traditional exam-driven curricula toward personalized, interdisciplinary education.

21st century skillsBig Datafuture education
0 likes · 8 min read
What Will Future Schools Look Like? Insights from Global Education Leaders

Storm vs Spark: Which Real‑Time Analytics Platform Wins for Your Business?

The article compares Apache Storm and Apache Spark, examining their origins, architecture, language support, integration capabilities, and performance characteristics, and offers guidance on selecting the right platform for real‑time business intelligence based on specific workload and infrastructure needs.

Apache SparkApache StormBig Data
0 likes · 11 min read
Storm vs Spark: Which Real‑Time Analytics Platform Wins for Your Business?

Social Network Analysis on Weibo: Label Propagation, User Similarity, Community Detection, Influence Ranking, and Spam User Identification

This article introduces a series of algorithms for analyzing the Weibo social network, including label propagation, LDA‑based user similarity, time‑aware and interaction‑aware similarity measures, community detection, influence ranking via PageRank variants, and methods for identifying spam users, illustrating how these techniques can be applied to large‑scale social media data.

Big DataSocial Network Analysisinfluence ranking
0 likes · 19 min read
Social Network Analysis on Weibo: Label Propagation, User Similarity, Community Detection, Influence Ranking, and Spam User Identification

Designing a Scalable Real‑Time Mobile Analytics Platform with Kafka, Storm, and Amazon EMR

The article describes how a mobile analytics service processes billions of events daily using a Lambda‑style architecture that combines Kafka, Storm, Amazon EMR, and S3 to achieve scalable, fault‑tolerant batch and real‑time computation, while ensuring reliable event ingestion and graceful degradation.

AWSBig DataKafka
0 likes · 8 min read
Designing a Scalable Real‑Time Mobile Analytics Platform with Kafka, Storm, and Amazon EMR

Mastering HBase: Table Structure, API Usage, and Performance Tuning

This article explains HBase's column‑oriented architecture, key concepts such as Rowkey, ColumnFamily, and Region, provides Java API examples for table operations, and offers practical optimization techniques—including pre‑splitting, Rowkey design, caching, and compaction settings—to improve read/write performance.

Big DataDatabase OptimizationHBase
0 likes · 20 min read
Mastering HBase: Table Structure, API Usage, and Performance Tuning
High Availability Architecture
High Availability Architecture
May 15, 2015 · Big Data

Real-Time Computing at Dianping: Architecture, Use Cases, and Best Practices

During a detailed live session, senior Dianping engineer Wang Xinchun explains the company's real‑time computing platform built on Apache Storm, covering use cases such as dashboards, search and recommendation, system architecture, data ingestion tools like Blackhole and Puma, performance tuning, monitoring, and practical best‑practice recommendations.

Apache StormBig DataReal‑Time Computing
0 likes · 21 min read
Real-Time Computing at Dianping: Architecture, Use Cases, and Best Practices
Ctrip Technology
Ctrip Technology
May 14, 2015 · Artificial Intelligence

Data‑Driven User Experience: Machine Learning Applications in Hotel Booking and Marketing at Ctrip

In his 2015 China Hotel Marketing Summit keynote, Ctrip CTO Ye Yamin explained how machine‑learning models built on purchase behavior and order data improve hotel room availability predictions, shorten confirmation times, personalize recommendations, and evaluate advertising effectiveness, illustrating a data‑driven approach to user experience and operations.

Big DataData AnalyticsMarketing
0 likes · 14 min read
Data‑Driven User Experience: Machine Learning Applications in Hotel Booking and Marketing at Ctrip
MaGe Linux Operations
MaGe Linux Operations
Apr 28, 2015 · Big Data

How LinkedIn Scales Kafka to Billions of Messages Every Day

This article explains how LinkedIn uses Apache Kafka as a high‑throughput, fault‑tolerant messaging backbone, detailing its architecture, message categories, layered replication, audit mechanisms, and the engineering practices that keep billions of daily messages reliable and fast.

Big DataDistributed SystemsKafka
0 likes · 11 min read
How LinkedIn Scales Kafka to Billions of Messages Every Day

Understanding Stream Processing, Event Sourcing, and Complex Event Processing

The article explains the fundamentals of stream processing, event sourcing, and complex event processing, comparing raw event storage with aggregated results, illustrating architectures with Kafka, Samza, and other frameworks, and highlighting benefits such as scalability, flexibility, and decoupling for modern data‑driven systems.

Apache KafkaApache SamzaBig Data
0 likes · 11 min read
Understanding Stream Processing, Event Sourcing, and Complex Event Processing
MaGe Linux Operations
MaGe Linux Operations
Apr 7, 2015 · Big Data

How Hadoop’s Tiered Storage Optimizes Data Based on Temperature

This article explains Hadoop’s tiered storage concept, describing how data is classified by temperature—hot, warm, cold, frozen—and automatically moved across disk and archive layers to optimize cost and performance, with examples from Hadoop versions and eBay’s large‑scale deployment.

Big DataData TemperatureHDFS
0 likes · 9 min read
How Hadoop’s Tiered Storage Optimizes Data Based on Temperature
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Mar 16, 2015 · Industry Insights

Inside Facebook’s Massive Architecture: How the Social Giant Scales to Billions

The article details Facebook’s LAMP‑based architecture, describing how HipHop compiles PHP to C++, Thrift‑based services in PHP, C++, and Java run on custom servers, and how MySQL, Memcached, Cassandra, HBase, Hadoop, Hive, Scribe, BigPipe, Varnish, Haystack and other components together enable handling over 60,000 servers, 300 TB of cached data, 1 trillion daily clicks and petabytes of storage.

BackendBig DataFacebook
0 likes · 7 min read
Inside Facebook’s Massive Architecture: How the Social Giant Scales to Billions
Suning Technology
Suning Technology
Feb 15, 2015 · Cloud Computing

How Suning IT Scaled Cloud and Big Data Platforms for Massive Sales Events

In 2014 Suning’s IT division built a robust, cloud‑native architecture and big‑data platform that enabled flexible scaling, real‑time analytics, secure CDN delivery, and comprehensive monitoring, supporting high‑traffic promotional events and laying the groundwork for future O2O and financial services expansion.

Big DataIT Operationscloud computing
0 likes · 9 min read
How Suning IT Scaled Cloud and Big Data Platforms for Massive Sales Events
Ctrip Technology
Ctrip Technology
Jan 23, 2015 · Big Data

Personalized Marketing in the Big Data Era: Ctrip’s Experience

At Ctrip’s TDAY event, senior VP Eric Ye explained how big‑data techniques such as cross‑screen processing, real‑time APIs, and predictive models enable personalized travel recommendations and dramatically improve call‑center efficiency, illustrating the commercial impact of data‑driven marketing.

Big DataCtripCustomer Behavior
0 likes · 3 min read
Personalized Marketing in the Big Data Era: Ctrip’s Experience
Baidu Tech Salon
Baidu Tech Salon
Jan 14, 2015 · Artificial Intelligence

Why Baidu’s 2014 AI Push Could Redefine the Future of Tech

The article examines Baidu’s massive 2014 investment in artificial intelligence—covering Baidu Brain, breakthroughs in vision, speech and NLP, big‑data capabilities, open platforms, IoT hardware, and talent strategy—to explain how these moves may reshape both Baidu and the broader technology landscape.

AI strategyBaiduBig Data
0 likes · 10 min read
Why Baidu’s 2014 AI Push Could Redefine the Future of Tech
Baidu Tech Salon
Baidu Tech Salon
Jan 13, 2015 · Big Data

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.

BaiduBig DataExternal Data Source API
0 likes · 16 min read
Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle
Baidu Tech Salon
Baidu Tech Salon
Dec 3, 2014 · Artificial Intelligence

Highlights from Baidu’s Technical Salon: AI, Big Data, and Innovation

At Baidu’s November 25 technical salon, senior leaders Robin Li, Wang Jin and Andrew Ng highlighted the company’s AI‑driven strategy—emphasizing sub‑second search, massive deep‑learning and data infrastructure, a 14% revenue R&D spend, breakthroughs in vision, speech, NLP and autonomous‑driving platforms, and a new AI talent program—positioning Baidu as a leading innovator in big‑data and intelligent connectivity.

BaiduBig Dataartificial intelligence
0 likes · 11 min read
Highlights from Baidu’s Technical Salon: AI, Big Data, and Innovation
ITPUB
ITPUB
Oct 30, 2014 · Big Data

Inside Fourinone: A Lightweight Distributed Framework Challenging Hadoop

The interview with Fourinone founder Peng Yuan explores the framework's evolution from a parallel computing project to a 220 KB distributed system with its own NoSQL database engine CoolHash, compares it to Hadoop, and discusses its open‑source release, technical design choices, and real‑world deployments in finance and enterprise environments.

Big DataCoolHashFourinone
0 likes · 31 min read
Inside Fourinone: A Lightweight Distributed Framework Challenging Hadoop
Baidu Tech Salon
Baidu Tech Salon
Oct 29, 2014 · Big Data

Inside Baidu’s Real-Time Big Data Platforms: Dstream and TM Explained

This article examines Baidu’s home‑grown real‑time big‑data platforms Dstream and TM, detailing their architectures, performance metrics, key features, and practical use cases such as log ETL and real‑time bidding, while highlighting how they meet millisecond‑level processing demands.

BaiduBig DataDstream
0 likes · 9 min read
Inside Baidu’s Real-Time Big Data Platforms: Dstream and TM Explained
Baidu Tech Salon
Baidu Tech Salon
Aug 19, 2014 · Big Data

Technology Stack Trends in Startup Companies Based on AngelList Data

Analyzing AngelList data for transportation‑sector startups, the report reveals JavaScript/Node.js and Ruby on Rails dominate programming and front‑end, MySQL/MongoDB lead storage, iOS tops mobile, AWS leads infrastructure, Chef leads DevOps, while Python is favored by higher‑tier firms and PHP by lower‑tier ones, though the scoring methodology remains opaque.

Big DataTechnology Stackdatabases
0 likes · 5 min read
Technology Stack Trends in Startup Companies Based on AngelList Data
MaGe Linux Operations
MaGe Linux Operations
Jul 30, 2014 · Databases

SQL vs NoSQL: Which Database Wins the Big Data Battle?

This article examines the ongoing debate between SQL and NoSQL databases for big‑data projects, presenting expert arguments on performance, scalability, standardization, and flexibility to help enterprises decide the optimal solution.

Big DataComparisonNoSQL
0 likes · 14 min read
SQL vs NoSQL: Which Database Wins the Big Data Battle?
Baidu Tech Salon
Baidu Tech Salon
May 8, 2014 · Big Data

How Baidu’s Big Data Engine Predicts Tourist Crowds for the May Day Holiday

Using search query volumes, Baidu’s big‑data engine forecasts tourist numbers for Chinese attractions, achieving over 90% accuracy and partnering with CCTV to broadcast real‑time crowd predictions during the May Day holiday, while also outlining future forecasting plans for flu, sports, finance and real‑estate trends.

BaiduBig DataCCTV
0 likes · 5 min read
How Baidu’s Big Data Engine Predicts Tourist Crowds for the May Day Holiday
Baidu Tech Salon
Baidu Tech Salon
Apr 8, 2014 · Big Data

Top 10 Open-Source Big Data Technologies and Industry Giants to Watch

The article surveys the rapid growth of big data across sectors, highlights key open‑source technologies such as Hadoop, Spark, HBase and others, and profiles ten influential companies—including AWS, Cloudera, Hortonworks, IBM and Microsoft—offering insight into current trends, capabilities and competitive dynamics in the big‑data ecosystem.

Big DataData Analyticsindustry giants
0 likes · 15 min read
Top 10 Open-Source Big Data Technologies and Industry Giants to Watch