Past Memory Big Data
Author

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

58
Articles
0
Likes
22
Views
0
Comments
Recent Articles

Latest from Past Memory Big Data

58 recent articles
Past Memory Big Data
Past Memory Big Data
Jan 4, 2026 · Industry Insights

Upgrade Your Stack: 2025 Apache Top-Level Projects You Should Know

The article reviews the eleven Apache projects graduating to top-level status in 2025, explaining how each—ranging from big‑data shuffle services and unified data processing to dev‑ops analytics, web frameworks, and messaging platforms—addresses specific infrastructure challenges and why they merit inclusion in modern technology stacks.

ApacheData InfrastructureDevOps
0 likes · 11 min read
Upgrade Your Stack: 2025 Apache Top-Level Projects You Should Know
Past Memory Big Data
Past Memory Big Data
Dec 31, 2025 · Industry Insights

NVIDIA Data‑Center GPU Evolution: V100 to B300 – A Programmer’s Selection Guide

The article maps the evolution of NVIDIA’s data‑center GPUs—from the Volta‑based V100 through Ampere A100, Hopper H100, specialized A800/H800/H20, up to the Blackwell B200/B300—detailing architectures, memory, interconnect, performance trade‑offs, and offers a decision framework for programmers to match each model to specific AI workloads, budgets and regulatory constraints.

AIData CenterGPU
0 likes · 11 min read
NVIDIA Data‑Center GPU Evolution: V100 to B300 – A Programmer’s Selection Guide
Past Memory Big Data
Past Memory Big Data
Dec 29, 2025 · Industry Insights

How Chinese Open‑Source Projects Dominated Half of 2025 Apache Top‑Level Projects

In 2025, five Apache Top‑Level Projects with Chinese origins—Uniffle, StreamPark, Gravitino, DevLake and HertzBeat—emerged, illustrating a shift toward central, platform‑oriented solutions driven by growing system scale, engineering complexity, and collaborative costs rather than a deliberate national agenda.

ApacheBig DataOpen Source
0 likes · 7 min read
How Chinese Open‑Source Projects Dominated Half of 2025 Apache Top‑Level Projects
Past Memory Big Data
Past Memory Big Data
Dec 12, 2025 · Big Data

How Uber Reduced Data Freshness from Hours to Minutes Using Flink Streaming

Uber rebuilt its data‑lake ingestion pipeline with Apache Flink, replacing batch jobs with a streaming architecture that cuts data freshness from hours to minutes, lowers compute usage by 25%, and solves challenges like small‑file proliferation, partition skew, and checkpoint‑commit synchronization at petabyte scale.

Apache FlinkApache HudiData Freshness
0 likes · 10 min read
How Uber Reduced Data Freshness from Hours to Minutes Using Flink Streaming
Past Memory Big Data
Past Memory Big Data
Dec 9, 2025 · Artificial Intelligence

A Decade of Evolution: Inside Pinterest’s AI Platform Journey

Over ten years Pinterest transformed a fragmented machine‑learning stack into a unified AI platform, iterating through stages from early ad‑hoc pipelines to scalable GPU‑accelerated services, while learning that timing, organization alignment, and efficiency are crucial for lasting impact.

AI platformFeature EngineeringGPU inference
0 likes · 25 min read
A Decade of Evolution: Inside Pinterest’s AI Platform Journey
Past Memory Big Data
Past Memory Big Data
Dec 4, 2025 · Artificial Intelligence

Text2SQL Showdown: Which Technical Path Delivers Higher Accuracy and Lower Cost?

The article analyzes two contrasting Text2SQL architectures—LLM + RAG + DSL versus rule‑driven NLQ—examining their accuracy under controlled conditions, implementation costs, complex query support, and real‑world suitability for enterprise BI, and concludes which approach is more reliable and cost‑effective.

AI+RulesBusiness IntelligenceDSL
0 likes · 16 min read
Text2SQL Showdown: Which Technical Path Delivers Higher Accuracy and Lower Cost?
Past Memory Big Data
Past Memory Big Data
Dec 1, 2025 · Big Data

Apache XTable: A Universal Translator for Data Lake Format Interoperability

Apache XTable introduces a lightweight metadata translation layer that decouples data storage from format metadata, enabling zero‑copy, omni‑directional conversion among Hudi, Iceberg, and Delta Lake, allowing organizations to write with one format and read with any engine without duplicating Parquet files.

Apache XTableData LakeDelta Lake
0 likes · 7 min read
Apache XTable: A Universal Translator for Data Lake Format Interoperability
Past Memory Big Data
Past Memory Big Data
Nov 12, 2025 · Big Data

How Uber Upgraded Over 2 Million Spark Jobs from 2.4 to 3.3

Uber migrated more than two million daily Spark applications from version 2.4 to 3.3, detailing the motivations, architecture, four-step migration process, custom tools like Polyglot Piranha and Iron Dome, and the resulting performance, cost, and productivity gains.

Apache SparkData EngineeringIron Dome
0 likes · 11 min read
How Uber Upgraded Over 2 Million Spark Jobs from 2.4 to 3.3
Past Memory Big Data
Past Memory Big Data
Jul 30, 2025 · Big Data

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

The article explains how Apache Iceberg v3 replaces the scalable‑limited positional‑delete mechanism in Merge‑on‑Read tables with compact Deletion Vectors, detailing the performance, I/O and metadata drawbacks of positional deletes and showing how the new bitmap‑based approach resolves them.

Apache IcebergData LakeDeletion Vector
0 likes · 20 min read
Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables
Past Memory Big Data
Past Memory Big Data
Apr 19, 2025 · Artificial Intelligence

Databricks Acquires Fennel: Is Real-Time Computing + AI the Ultimate Data Platform?

The article examines Databricks' acquisition of the incremental computation engine Fennel, detailing how its unified batch‑stream processing, incremental updates, Python‑native development, and built‑in data governance can eliminate data silos, cut costs by up to 90 % and accelerate real‑time feature engineering for AI models, while also discussing industry impact and future roadmap.

AI InfrastructureDatabricksFeature Engineering
0 likes · 6 min read
Databricks Acquires Fennel: Is Real-Time Computing + AI the Ultimate Data Platform?