Tag

large-scale data

1 views collected around this technical thread.

Zhuanzhuan Tech
Zhuanzhuan Tech
Apr 3, 2024 · Backend Development

Design and Implementation of an Elasticsearch Data Synchronization Service (ECP) for Large‑Scale Order Data

This article describes the challenges and technical solutions for synchronizing billions of order records from a relational database to Elasticsearch, including multi‑source data reading, dynamic rate limiting, retry strategies, SPI‑based service integration, environment isolation, health‑checking, smooth migration, and structured logging, all implemented in a backend service called ECP.

Backend ServiceElasticsearchJava
0 likes · 21 min read
Design and Implementation of an Elasticsearch Data Synchronization Service (ECP) for Large‑Scale Order Data
Zhuanzhuan Tech
Zhuanzhuan Tech
May 30, 2023 · Backend Development

Design and Architecture of a Checkout System: Scenarios, Features, Third‑Party Integration, and Large‑Scale Data Solutions

This article explains the background, key scenarios, functional components, third‑party payment capabilities, implementation logic, rule‑engine usage, and large‑scale data handling strategies of a checkout system, providing a comprehensive view of its backend architecture and operational considerations.

Backendcheckoutlarge-scale data
0 likes · 14 min read
Design and Architecture of a Checkout System: Scenarios, Features, Third‑Party Integration, and Large‑Scale Data Solutions
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Deep LearningMultimodalNatural Language Processing
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
AntTech
AntTech
Nov 28, 2022 · Information Security

Ant Group Anti‑Intrusion Platform: Architecture, Trillion‑Scale Detection, Risk Assessment, and Automated Response

This article details the evolution, architecture, and key technologies of Ant Group's anti‑intrusion platform, explaining how it handles trillion‑level data streams for intrusion detection, performs multi‑dimensional risk assessment and attribution, and enables rapid, automated security incident response across massive enterprise environments.

Intrusion DetectionSecurity Automationanti-intrusion
0 likes · 15 min read
Ant Group Anti‑Intrusion Platform: Architecture, Trillion‑Scale Detection, Risk Assessment, and Automated Response
DataFunTalk
DataFunTalk
Oct 28, 2022 · Big Data

Angel Graph: A High‑Performance Distributed Graph Computing Framework for Intelligent Risk Control

Angel Graph is a high‑performance, fault‑tolerant distributed graph computing framework developed by Tencent, featuring scalable node‑metric, community‑detection, and graph‑neural‑network algorithms optimized for billion‑node, trillion‑edge datasets, and demonstrated through practical applications in intelligent financial risk control.

Graph Computingcommunity-detectiondistributed systems
0 likes · 20 min read
Angel Graph: A High‑Performance Distributed Graph Computing Framework for Intelligent Risk Control
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
Oct 28, 2022 · Backend Development

How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets

Facing a challenge of managing roughly one million server-side images and 180 client images, the TOOSIMPLE team built a high-performance backend using fingerprinting, parallel processing, mmap-SSE2 acceleration, and sparsemap indexing, achieving sub-second response times while ensuring correct ordered display.

Backend DevelopmentHashingMMAP
0 likes · 12 min read
How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets
IT Services Circle
IT Services Circle
Jun 18, 2022 · Databases

Efficiently Importing Massive CSV Data into MySQL with Python: pymysql vs pandas‑SQLAlchemy

This article demonstrates two approaches for efficiently importing massive CSV data into MySQL using Python: a direct pymysql method with chunked inserts and a concise pandas‑SQLAlchemy method, comparing performance, code complexity, and offering tips for further speed improvements.

Data ingestionMySQLPyMySQL
0 likes · 5 min read
Efficiently Importing Massive CSV Data into MySQL with Python: pymysql vs pandas‑SQLAlchemy
Architecture Digest
Architecture Digest
Jun 7, 2022 · Big Data

Design and Optimization Strategies for Querying 100K Records from Tens of Millions Using ClickHouse, Elasticsearch, HBase, and RediSearch

This article examines a business requirement to filter up to 100,000 items from a pool of tens of millions, presenting and evaluating four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑HBase hybrid, and RediSearch + RedisJSON—along with performance data and implementation details.

ClickHouseElasticsearchHBase
0 likes · 10 min read
Design and Optimization Strategies for Querying 100K Records from Tens of Millions Using ClickHouse, Elasticsearch, HBase, and RediSearch
DataFunTalk
DataFunTalk
Feb 1, 2022 · Big Data

Kafka at Meituan: Practices, Challenges, and Optimizations for Large‑Scale Data Platforms

This article presents Meituan's large‑scale Kafka deployment, describing the current state and challenges of massive data ingestion, detailing latency‑reduction techniques, cluster‑level optimizations, SSD‑based caching, isolation strategies, full‑link monitoring, lifecycle management, and future directions for high availability.

KafkaMeituanRead/Write Latency
0 likes · 22 min read
Kafka at Meituan: Practices, Challenges, and Optimizations for Large‑Scale Data Platforms
High Availability Architecture
High Availability Architecture
May 18, 2021 · Big Data

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

This article details Baidu's large‑scale image processing and multimodal retrieval system, describing its offline‑online architecture, massive data ingestion pipeline, ANN search techniques, performance metrics, infrastructure components, and a series of optimizations for throughput, cost, and reliability in a high‑volume streaming environment.

BaiduImazonimage processing
0 likes · 12 min read
Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)
Architecture Digest
Architecture Digest
Jan 8, 2021 · Databases

Scaling Zhihu's Moneta Application with TiDB: Architecture, Performance, and Lessons Learned

This article details how Zhihu tackled the massive data and latency challenges of its Moneta service by migrating from MySQL sharding and MHA to the distributed NewSQL database TiDB, describing the new three‑tier architecture, performance gains, migration tactics, and expectations for TiDB 3.0.

Database ScalabilityHTAPNewSQL
0 likes · 13 min read
Scaling Zhihu's Moneta Application with TiDB: Architecture, Performance, and Lessons Learned
Java Architect Essentials
Java Architect Essentials
Sep 6, 2020 · Databases

Scaling Zhihu's Moneta Service with TiDB: Architecture, Performance, and Lessons Learned

This article describes how Zhihu's Moneta service, which stores over a trillion rows of user‑read data, migrated from MySQL sharding to the distributed NewSQL database TiDB to achieve high availability, horizontal scalability, millisecond‑level query latency, and improved overall system performance.

Database ScalabilityHTAPPerformance Optimization
0 likes · 13 min read
Scaling Zhihu's Moneta Service with TiDB: Architecture, Performance, and Lessons Learned
DataFunTalk
DataFunTalk
Feb 26, 2020 · Databases

ByteGraph: ByteDance’s Distributed Graph Database and Graph Computing System – Architecture, Data Model, and Practices

This article presents an in‑depth technical overview of ByteGraph, ByteDance’s self‑built distributed graph database and its accompanying graph‑computing engine, covering graph data characteristics, the directed‑property graph model, API design, three‑tier system architecture, storage strategies using KV stores and B‑Trees, hotspot handling, indexing, and future research directions.

B-TreeByteGraphDistributed Storage
0 likes · 33 min read
ByteGraph: ByteDance’s Distributed Graph Database and Graph Computing System – Architecture, Data Model, and Practices