DataFunSummit
Author

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

1.7k
Articles
0
Likes
6.8k
Views
0
Comments
Recent Articles

Latest from DataFunSummit

100 recent articles max
DataFunSummit
DataFunSummit
May 6, 2026 · Artificial Intelligence

Inside 1688’s Inference‑Based Recommendation System: Architecture, Challenges, and Future Directions

This article details how Alibaba 1688 tackles the “information cocoon” problem by deploying large‑model inference‑based recommendation, describing its three‑layer architecture, multi‑stage user demand analysis, long‑cycle behavior compression, prompt engineering, trend mining, near‑line serving, and future enhancements.

Prompt Engineeringbehavior compressione-commerce
0 likes · 23 min read
Inside 1688’s Inference‑Based Recommendation System: Architecture, Challenges, and Future Directions
DataFunSummit
DataFunSummit
May 5, 2026 · Artificial Intelligence

How Huawei Noah’s KAR Project Leverages LLMs to Advance Recommendation Systems

The article reviews the evolution of recommendation systems from deep learning to large language models, analyzes core challenges such as noisy implicit feedback and limited semantic understanding, and details Huawei Noah’s KAR solution that uses factorized prompting, multi‑expert adapters, and AI‑Agent architectures to achieve a 1.5% AUC lift and validated online A/B test results.

AI AgentAUCHuawei
0 likes · 5 min read
How Huawei Noah’s KAR Project Leverages LLMs to Advance Recommendation Systems
DataFunSummit
DataFunSummit
May 5, 2026 · Big Data

A New Data Lake Paradigm: Volcano Engine’s Multi‑Modal Data Lake Built on Lance

The article presents Volcano Engine’s AI‑focused data lake built on the Lance format, detailing why traditional lakes fall short for multimodal data, the engineering enhancements such as Binary Copy Compaction, Lance Insight, distributed vector indexing, JSON‑based tagging, Row‑ID shuffle optimization, and real‑world case studies that demonstrate significant performance and cost gains.

AIBinary Copy CompactionData Lake
0 likes · 18 min read
A New Data Lake Paradigm: Volcano Engine’s Multi‑Modal Data Lake Built on Lance
DataFunSummit
DataFunSummit
May 4, 2026 · Artificial Intelligence

DeepSeek’s MCTS Failure: The ‘Roast Chicken and Baijiu’ Dilemma in LLM Training

The article examines why DeepSeek’s large‑model training cannot yet leverage Monte‑Carlo Tree Search, detailing its reliance on SFT, GRPO‑driven CoT activation and rejection‑sampling, contrasting this with Google’s PRM‑based approaches, and proposing a MCTS‑powered data‑generation pipeline to overcome the “roast chicken and baijiu” training dilemma.

Chain-of-ThoughtGRPOMonte Carlo Tree Search
0 likes · 14 min read
DeepSeek’s MCTS Failure: The ‘Roast Chicken and Baijiu’ Dilemma in LLM Training
DataFunSummit
DataFunSummit
May 4, 2026 · Artificial Intelligence

Inside Alibaba Cloud AI Search: Agentic RAG Architecture and Multi‑Agent Techniques

Alibaba Cloud AI Search tackles high‑concurrency, multimodal, and multi‑hop queries by evolving its Agentic RAG architecture from a single agent to a coordinated multi‑agent system that integrates planning, retrieval, and generation, leverages hybrid vector‑text‑DB‑graph recall, GPU‑accelerated indexing, quantization, NL2SQL, and multimodal search, with performance data and real‑world case studies.

AI searchAgentic RAGAlibaba Cloud
0 likes · 6 min read
Inside Alibaba Cloud AI Search: Agentic RAG Architecture and Multi‑Agent Techniques
DataFunSummit
DataFunSummit
May 4, 2026 · Artificial Intelligence

Best Practices for Persistent, Reliable AI Agent Memory: Insights from the ‘Memory in the Age of AI Agents’ Paper

The article analyzes the 2025 "Memory in the Age of AI Agents" paper, presenting its three‑dimensional classification of AI memory (Forms, Functions, Dynamics), comparing token‑level, parameter‑level and latent‑space approaches, evaluating major frameworks such as Mem0, Letta, Zep, ReMem, and offering concrete guidance on design, forgetting mechanisms, retrieval strategies, and future research directions.

AI memoryagentic AIlatent space memory
0 likes · 17 min read
Best Practices for Persistent, Reliable AI Agent Memory: Insights from the ‘Memory in the Age of AI Agents’ Paper
DataFunSummit
DataFunSummit
May 3, 2026 · Databases

ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era

ScopeDB introduces a cloud‑native, real‑time analytics database that combines structured core columns with a flexible JSON column, adaptive indexing, a custom query language (ScopeQL), and true compute‑storage separation, delivering sub‑second query latency, high throughput, and up to 70% cost reduction compared with traditional big‑data stacks.

Cloud NativeDatabaseScopeDB
0 likes · 14 min read
ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era
DataFunSummit
DataFunSummit
May 3, 2026 · Artificial Intelligence

From Flawed to Production-Ready: Deep Dive into Building Enterprise-Grade RAG Systems

The article analyzes why early RAG deployments often fall short, dissects the most common technical pain points—from document parsing to vector overload—and presents a systematic roadmap that includes hybrid search, reranking, GraphRAG, Agentic RAG, model selection, scalability tricks, and security controls for robust B‑side production.

Agentic RAGEnterprise AIGraphRAG
0 likes · 20 min read
From Flawed to Production-Ready: Deep Dive into Building Enterprise-Grade RAG Systems
DataFunSummit
DataFunSummit
May 2, 2026 · Cloud Native

GooseFS + Lance: Accelerating Vector Storage for the AI Era

The article explains how GooseFS integrates with the Lance vector format to overcome the IO bottlenecks of object storage, detailing native acceleration mechanisms such as namespace catalog services, event‑driven warm caching, automatic compaction, native transactions, and page‑level caching that together deliver up to three‑fold performance gains for AI workloads.

AICache AccelerationCloud Native
0 likes · 12 min read
GooseFS + Lance: Accelerating Vector Storage for the AI Era
DataFunSummit
DataFunSummit
May 2, 2026 · Artificial Intelligence

How Palantir’s 4‑Layer Ontology Architecture Enables Buildings, Tenants, and Data to ‘Talk’

Healthpeak transformed its commercial‑real‑estate operations by replacing fragmented spreadsheets with Palantir’s AI Platform (AIP), using a four‑layer architecture and ontology‑driven modeling to automate billing, detect anomalies, and orchestrate workflows, dramatically cutting manual effort, errors, and scaling costs.

AI Workflow AutomationCommercial Real EstateData Integration
0 likes · 18 min read
How Palantir’s 4‑Layer Ontology Architecture Enables Buildings, Tenants, and Data to ‘Talk’