Tagged articles

Large-Scale Data

28 articles · Page 1 of 1
Machine Heart
Machine Heart
Jun 29, 2026 · Artificial Intelligence

Open‑Source AI‑Infra Ops Agent Benchmark Powered by Hundreds of Billions of Real Data

The article introduces AISHPerf, the first open‑source benchmark for AI‑infra operations agents built on nearly a hundred‑billion real‑world ops records, detailing its data pipeline, multi‑layer coverage, evaluation metrics, experimental results that show current models lag behind human experts, and future plans to expand and refine the benchmark.

AI OpsEvaluation MetricsFault Injection
0 likes · 16 min read
Open‑Source AI‑Infra Ops Agent Benchmark Powered by Hundreds of Billions of Real Data
Machine Heart
Machine Heart
Jun 18, 2026 · Artificial Intelligence

Automating 3D Spatial Data: Holi‑Spatial’s 4M‑Scale Multimodal Dataset (ICML 2026 Oral)

Holi‑Spatial introduces a fully automatic pipeline that transforms raw video streams into high‑quality 3D geometry, depth, masks, 3D boxes, instance descriptions, grounding and spatial QA, producing the 4‑million‑item Holi‑Spatial‑4M dataset and substantially improving VLM spatial reasoning performance.

3D reconstructionICML 2026Large-Scale Data
0 likes · 14 min read
Automating 3D Spatial Data: Holi‑Spatial’s 4M‑Scale Multimodal Dataset (ICML 2026 Oral)
Machine Heart
Machine Heart
May 27, 2026 · Artificial Intelligence

How NeoteAI’s Tactile Embodied AI Lets Robots ‘Feel’ the World – Near‑100 M CNY Angel Round

NeoteAI, a Fudan‑affiliated startup, raised nearly 100 million yuan to advance its visual‑tactile sensor, large‑scale data platform, and VTLA model that together give robots precise touch perception, boosting fine‑grained manipulation success rates above 90% in industrial settings.

AI modelEmbodied AILarge-Scale Data
0 likes · 10 min read
How NeoteAI’s Tactile Embodied AI Lets Robots ‘Feel’ the World – Near‑100 M CNY Angel Round
Machine Heart
Machine Heart
Apr 18, 2026 · Artificial Intelligence

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Paxini, together with JD Cloud, Tencent Cloud, and Baidu Intelligent Cloud, launches the world’s first hundred‑billion‑scale, full‑modal, high‑degree‑of‑freedom embodied AI data cloud mall, offering instant online data procurement, end‑to‑end model training pipelines, and validated performance gains in both lab and real‑world robot tasks.

Embodied AILarge-Scale DataModel Training
0 likes · 13 min read
Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall
Tencent Advertising Technology
Tencent Advertising Technology
Sep 3, 2025 · Artificial Intelligence

Boosting Ads Revenue: LFM4Ads’ Full‑Representation Multi‑Granular Transfer Raises GMV 2.45%

Tencent's LFM4Ads introduces a full‑representation, multi‑granular knowledge transfer framework that moves user, item, and cross representations from a large foundation model to downstream tasks, achieving up to 2.45% platform GMV uplift across more than ten advertising scenarios.

Knowledge TransferLarge-Scale Dataads recommendation
0 likes · 12 min read
Boosting Ads Revenue: LFM4Ads’ Full‑Representation Multi‑Granular Transfer Raises GMV 2.45%
Zhuanzhuan Tech
Zhuanzhuan Tech
Apr 3, 2024 · Backend Development

Design and Implementation of an Elasticsearch Data Synchronization Service (ECP) for Large‑Scale Order Data

This article describes the challenges and technical solutions for synchronizing billions of order records from a relational database to Elasticsearch, including multi‑source data reading, dynamic rate limiting, retry strategies, SPI‑based service integration, environment isolation, health‑checking, smooth migration, and structured logging, all implemented in a backend service called ECP.

Data synchronizationJavaLarge-Scale Data
0 likes · 21 min read
Design and Implementation of an Elasticsearch Data Synchronization Service (ECP) for Large‑Scale Order Data
dbaplus Community
dbaplus Community
Nov 15, 2023 · Databases

Scaling Bloom Filter for 800 Million OpenIDs in Redis

This article explains how to use a Bloom filter backed by Redis bitmap and Roaring Bitmap sharding to efficiently filter 800 million OpenID queries, covering memory planning, hash function selection, code implementation, and performance‑tuned batch write strategies.

Large-Scale DataRoaring Bitmapbackend optimization
0 likes · 13 min read
Scaling Bloom Filter for 800 Million OpenIDs in Redis
ITPUB
ITPUB
Oct 1, 2023 · Backend Development

Scaling Schema‑Free Classified Ads Platforms: Storage & Search for Billions

This article explains how to design a scalable architecture for classification‑info platforms that handle billions of rows, ten‑thousand attributes, and hundred‑thousand QPS by using vertical partitioning, unified post, category, and search services, along with compressed JSON extensions and external indexing.

Large-Scale DataScalable ArchitectureVertical Partitioning
0 likes · 12 min read
Scaling Schema‑Free Classified Ads Platforms: Storage & Search for Billions
Zhuanzhuan Tech
Zhuanzhuan Tech
May 30, 2023 · Backend Development

Design and Architecture of a Checkout System: Scenarios, Features, Third‑Party Integration, and Large‑Scale Data Solutions

This article explains the background, key scenarios, functional components, third‑party payment capabilities, implementation logic, rule‑engine usage, and large‑scale data handling strategies of a checkout system, providing a comprehensive view of its backend architecture and operational considerations.

Large-Scale Databackendcheckout
0 likes · 14 min read
Design and Architecture of a Checkout System: Scenarios, Features, Third‑Party Integration, and Large‑Scale Data Solutions
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Deep LearningLarge-Scale DataMultimodal
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
AntTech
AntTech
Nov 28, 2022 · Information Security

Ant Group Anti‑Intrusion Platform: Architecture, Trillion‑Scale Detection, Risk Assessment, and Automated Response

This article details the evolution, architecture, and key technologies of Ant Group's anti‑intrusion platform, explaining how it handles trillion‑level data streams for intrusion detection, performs multi‑dimensional risk assessment and attribution, and enables rapid, automated security incident response across massive enterprise environments.

Intrusion DetectionLarge-Scale Dataanti-intrusion
0 likes · 15 min read
Ant Group Anti‑Intrusion Platform: Architecture, Trillion‑Scale Detection, Risk Assessment, and Automated Response
DataFunTalk
DataFunTalk
Oct 28, 2022 · Big Data

Angel Graph: A High‑Performance Distributed Graph Computing Framework for Intelligent Risk Control

Angel Graph is a high‑performance, fault‑tolerant distributed graph computing framework developed by Tencent, featuring scalable node‑metric, community‑detection, and graph‑neural‑network algorithms optimized for billion‑node, trillion‑edge datasets, and demonstrated through practical applications in intelligent financial risk control.

Large-Scale Datacommunity-detectiondistributed systems
0 likes · 20 min read
Angel Graph: A High‑Performance Distributed Graph Computing Framework for Intelligent Risk Control
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
Oct 28, 2022 · Backend Development

How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets

Facing a challenge of managing roughly one million server-side images and 180 client images, the TOOSIMPLE team built a high-performance backend using fingerprinting, parallel processing, mmap-SSE2 acceleration, and sparsemap indexing, achieving sub-second response times while ensuring correct ordered display.

HashingLarge-Scale Datagolang
0 likes · 12 min read
How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets
ITPUB
ITPUB
Jun 9, 2022 · Artificial Intelligence

How 58’s Multi‑Label Image Recognition Boosts Semantic Search and Recommendations

This article details the design, data pipeline, model architecture, loss functions, and evaluation metrics of a large‑scale multi‑label image classification system built for 58.com, showing how it improves semantic similarity detection, recommendation, and content moderation across diverse business domains.

Deep LearningLarge-Scale Dataasymmetric loss
0 likes · 18 min read
How 58’s Multi‑Label Image Recognition Boosts Semantic Search and Recommendations
Architecture Digest
Architecture Digest
Jun 7, 2022 · Big Data

Design and Optimization Strategies for Querying 100K Records from Tens of Millions Using ClickHouse, Elasticsearch, HBase, and RediSearch

This article examines a business requirement to filter up to 100,000 items from a pool of tens of millions, presenting and evaluating four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑HBase hybrid, and RediSearch + RedisJSON—along with performance data and implementation details.

HBaseLarge-Scale DataQuery Optimization
0 likes · 10 min read
Design and Optimization Strategies for Querying 100K Records from Tens of Millions Using ClickHouse, Elasticsearch, HBase, and RediSearch
DataFunTalk
DataFunTalk
Feb 1, 2022 · Big Data

Kafka at Meituan: Practices, Challenges, and Optimizations for Large‑Scale Data Platforms

This article presents Meituan's large‑scale Kafka deployment, describing the current state and challenges of massive data ingestion, detailing latency‑reduction techniques, cluster‑level optimizations, SSD‑based caching, isolation strategies, full‑link monitoring, lifecycle management, and future directions for high availability.

KafkaLarge-Scale DataMeituan
0 likes · 22 min read
Kafka at Meituan: Practices, Challenges, and Optimizations for Large‑Scale Data Platforms
Java Interview Crash Guide
Java Interview Crash Guide
Dec 2, 2021 · Databases

How Zhihu Scaled to Trillions of Rows with TiDB – Real‑Time Query Performance Insights

Zhihu’s Moneta service stores over a trillion rows and faces massive write and read loads; this article explains why TiDB was chosen, how its architecture and features such as HTAP, Raft, Titan and table partitioning enable millisecond‑level query latency, high availability, and seamless scaling.

HTAPLarge-Scale DataPerformance Optimization
0 likes · 15 min read
How Zhihu Scaled to Trillions of Rows with TiDB – Real‑Time Query Performance Insights
21CTO
21CTO
May 18, 2021 · Big Data

How Baidu Scales Multimodal Image Search with the Imazon Platform

This article explains Baidu's multimodal retrieval system, detailing the offline and online pipelines, the image processing and indexing platform (Imazon), its architecture, key technologies such as ANN and GPU models, and the optimization practices that enable massive daily image ingestion and real‑time search at billion‑scale.

BaiduImage processingLarge-Scale Data
0 likes · 13 min read
How Baidu Scales Multimodal Image Search with the Imazon Platform
High Availability Architecture
High Availability Architecture
May 18, 2021 · Big Data

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

This article details Baidu's large‑scale image processing and multimodal retrieval system, describing its offline‑online architecture, massive data ingestion pipeline, ANN search techniques, performance metrics, infrastructure components, and a series of optimizations for throughput, cost, and reliability in a high‑volume streaming environment.

BaiduImage processingImazon
0 likes · 12 min read
Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)
Java Backend Technology
Java Backend Technology
Mar 21, 2020 · Databases

How Zhihu Scaled to Trillions of Rows with TiDB: Lessons from Moneta

Zhihu’s Moneta service, handling over 1.3 trillion rows and billions of daily writes, migrated from MySQL sharding to TiDB, achieving millisecond query latency, high availability, and horizontal scalability, while sharing architectural choices, performance metrics, migration challenges, and future expectations for TiDB 3.0.

HTAPLarge-Scale DataMySQL Migration
0 likes · 16 min read
How Zhihu Scaled to Trillions of Rows with TiDB: Lessons from Moneta
DataFunTalk
DataFunTalk
Feb 26, 2020 · Databases

ByteGraph: ByteDance’s Distributed Graph Database and Graph Computing System – Architecture, Data Model, and Practices

This article presents an in‑depth technical overview of ByteGraph, ByteDance’s self‑built distributed graph database and its accompanying graph‑computing engine, covering graph data characteristics, the directed‑property graph model, API design, three‑tier system architecture, storage strategies using KV stores and B‑Trees, hotspot handling, indexing, and future research directions.

B+TreeByteGraphDistributed storage
0 likes · 33 min read
ByteGraph: ByteDance’s Distributed Graph Database and Graph Computing System – Architecture, Data Model, and Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 23, 2018 · Artificial Intelligence

How Alibaba’s “Cangjingge” Knowledge Engine Powers AI with Massive Graphs

Alibaba, together with top Chinese universities and research institutes, unveiled the Cangjingge Knowledge Engine project, detailing its massive data assets, five‑module architecture, large‑scale knowledge construction techniques, and initial deployments in safety and tourism knowledge graphs to boost AI applications.

AIAlibabaKnowledge Graph
0 likes · 9 min read
How Alibaba’s “Cangjingge” Knowledge Engine Powers AI with Massive Graphs
21CTO
21CTO
Mar 22, 2017 · Artificial Intelligence

How Youku Tudou Revamped Its Video Recommendation Engine for Real‑Time Ranking

The Youku Tudou data team overhauled its video recommendation system by moving ranking from offline to online, detailing architectural changes, advantages, challenges, feature handling, offline evaluation, and model weight fusion to improve scalability and user experience.

AB testingAILarge-Scale Data
0 likes · 7 min read
How Youku Tudou Revamped Its Video Recommendation Engine for Real‑Time Ranking