Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftLancePython
0 likes · 21 min read
Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines
Big Data Technology Tribe
Big Data Technology Tribe
Mar 15, 2026 · Databases

How to Build Distributed Scalar Indexes with Lance and Ray

This guide explains the end‑to‑end workflow for constructing a distributed scalar index in Lance by orchestrating validation, fragment sharding, worker‑level indexing via Ray, and final metadata merging, complete with code snippets and detailed step‑by‑step instructions.

DatasetsLancePython
0 likes · 12 min read
How to Build Distributed Scalar Indexes with Lance and Ray
Big Data Technology Tribe
Big Data Technology Tribe
Feb 26, 2026 · Databases

How optimize_indices Improves Query Performance in Lance

The article explains the purpose and inner workings of Lance's optimize_indices function, detailing how it incorporates newly appended data into existing indexes, merges delta indexes, and manages partition adjustments to maintain fast vector and scalar query performance without full re‑training.

IVFLanceoptimize_indices
0 likes · 8 min read
How optimize_indices Improves Query Performance in Lance
Big Data Technology Tribe
Big Data Technology Tribe
Feb 25, 2026 · Databases

How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution

Lance uses Multi-Version Concurrency Control to provide ACID guarantees, creating immutable table versions on each commit and employing atomic storage primitives, rebase logic, and retry mechanisms to handle concurrent writes, conflict detection, and resolution across multiple transaction types.

Concurrency ControlDatabase InternalsLance
0 likes · 16 min read
How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 5, 2025 · Artificial Intelligence

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

This article introduces DeepSeek Smallpond, a lightweight yet high‑performance AI data‑processing engine built on Ray and DuckDB, explains its dual Dataframe and LogicalPlan APIs, showcases integration with Volcano Engine's AI Data Lake LAS, and provides practical code examples for distributed processing, multimodal storage, and RAG pipelines.

AI data processingDistributed ComputingDuckDB
0 likes · 18 min read
How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB