Tag

data management

1 views collected around this technical thread.

Baidu Tech Salon
Baidu Tech Salon
Jun 17, 2025 · Operations

How Baidu Scaled Its Vertical Search: Elastic Scheduling and Data Management Secrets

This article explains how Baidu's vertical search platform tackled massive data growth and scaling challenges by redesigning its data management system, introducing elastic scheduling, decoupling ETCD access, implementing auto‑scaling, and advancing shard expansion to improve performance, stability, and cost efficiency.

ETCDShardingauto scaling
0 likes · 18 min read
How Baidu Scaled Its Vertical Search: Elastic Scheduling and Data Management Secrets
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 11, 2025 · Cloud Computing

How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks

This article examines Alibaba Cloud’s Qi Tian integrated operation‑management platform, detailing the challenges of massive cloud network management and the innovative data‑fusion, automated change, intent‑aware monitoring, and multi‑plane self‑healing technologies that enable secure, high‑performance operation at million‑device scale.

AICloud Computingdata management
0 likes · 11 min read
How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks
DaTaobao Tech
DaTaobao Tech
Apr 28, 2025 · Frontend Development

Front‑End Architecture and Performance Optimization for a Large‑Scale Chinese New Year Interactive Activity

The article details a large‑scale Chinese New Year interactive activity’s front‑end architecture, describing a layered system for business logic, data abstraction, and animation engines, unified data handling, dynamic animation rendering with downgrade paths, high‑concurrency QPS reduction, resilience measures, and extensive performance and workflow optimizations.

animationarchitecturedata management
0 likes · 15 min read
Front‑End Architecture and Performance Optimization for a Large‑Scale Chinese New Year Interactive Activity
Sohu Tech Products
Sohu Tech Products
Mar 19, 2025 · Artificial Intelligence

Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models

The article introduces Easy DataSet, an open‑source tool that streamlines the creation of domain‑specific datasets by aggregating public data sources, chunking Markdown documents, generating and managing QA pairs with configurable LLM endpoints, and exporting them in common formats, while outlining its architecture and future roadmap.

AILLM fine-tuningdata management
0 likes · 30 min read
Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models
IT Architects Alliance
IT Architects Alliance
Dec 29, 2024 · Fundamentals

Five Common Mistakes in IT Architecture Design and How to Avoid Them

This article outlines five common IT architecture design errors—neglecting connectivity, postponing security, poor compatibility, uncontrolled data duplication, and unsynchronized environments—illustrated with real cases and provides practical strategies to prevent each pitfall and build resilient, efficient systems.

CompatibilityEnvironment SyncIT architecture
0 likes · 11 min read
Five Common Mistakes in IT Architecture Design and How to Avoid Them
DataFunSummit
DataFunSummit
Dec 6, 2024 · Artificial Intelligence

Xiaomi AI Data Management Platform: Design, Implementation, and Practice

This article presents the background, design principles, architecture, and practical deployment of Xiaomi's AI Data Management Platform, highlighting how unified cataloging, Fileset integration, and notebook‑based development address AI data governance, cost reduction, and workflow efficiency for both structured and non‑structured data.

AI dataFilesetbig data
0 likes · 15 min read
Xiaomi AI Data Management Platform: Design, Implementation, and Practice
JD Retail Technology
JD Retail Technology
Oct 29, 2024 · Big Data

JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS

This article details JD's large‑scale HDFS unified storage implementation, covering cross‑region storage challenges, topology design, asynchronous block replication, flow‑control mechanisms, tiered storage strategies, automatic hot‑cold data migration, and the resulting performance and cost improvements for big‑data workloads.

Cross-Region StorageHDFSTiered Storage
0 likes · 20 min read
JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS
DataFunSummit
DataFunSummit
Oct 4, 2024 · Big Data

JD Retail HDFS Unified Storage: Cross‑Region and Tiered Storage Practices

This article presents JD Retail's large‑scale HDFS deployment, detailing its unified storage architecture, cross‑region data replication challenges and solutions, tiered storage strategies for hot, warm and cold data, and the operational modules that together improve performance, reliability and cost efficiency in a big‑data environment.

Cross-Region StorageHDFSTiered Storage
0 likes · 21 min read
JD Retail HDFS Unified Storage: Cross‑Region and Tiered Storage Practices
DataFunSummit
DataFunSummit
Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Artificial IntelligenceData qualityInstruction Fine-Tuning
0 likes · 18 min read
Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges
Continuous Delivery 2.0
Continuous Delivery 2.0
Aug 1, 2024 · Fundamentals

The Essence of Data Governance: Managing Data and People

This article reflects on the challenges of data governance, emphasizing that effective governance involves not only technical data handling but also managing people, aligning responsibilities, fostering cooperation between leadership and business units, and establishing clear ownership and incentive mechanisms.

best practicesdata governancedata management
0 likes · 4 min read
The Essence of Data Governance: Managing Data and People
360 Smart Cloud
360 Smart Cloud
Jul 26, 2024 · Cloud Computing

Object Storage: Origins, Evolution, Core Operations, Advantages, Use Cases, and Future Trends

This article provides a comprehensive overview of object storage, covering its origins, historical development, fundamental operations, key advantages over block and file storage, major application scenarios, and predicted future trends in cloud and big‑data environments.

big datacloud storagedata management
0 likes · 15 min read
Object Storage: Origins, Evolution, Core Operations, Advantages, Use Cases, and Future Trends
DataFunTalk
DataFunTalk
May 27, 2024 · Big Data

JD Retail’s Unified HDFS Storage: Cross‑Region and Hierarchical Storage Practices

This article details JD Retail’s large‑scale HDFS deployment, describing how cross‑region storage challenges were solved with a full‑copy topology, asynchronous block replication, flow‑control mechanisms, and a tiered storage strategy that automatically moves hot, warm, and cold data among SSD, HDD, and high‑density HDD nodes to improve performance and cut costs.

Cross-RegionDistributed StorageHDFS
0 likes · 20 min read
JD Retail’s Unified HDFS Storage: Cross‑Region and Hierarchical Storage Practices
DataFunTalk
DataFunTalk
Apr 28, 2024 · Big Data

Ant Group’s Data Governance Practices: Overview, Data Quality, and Data Storage Governance

This article shares Ant Group's extensive experience in big data governance, detailing the overall data governance framework, data quality management, data storage governance, and future considerations, illustrated with practical cases and strategies for ensuring compliance, reliability, and cost efficiency.

Ant GroupData ArchitectureData quality
0 likes · 17 min read
Ant Group’s Data Governance Practices: Overview, Data Quality, and Data Storage Governance
DataFunSummit
DataFunSummit
Apr 25, 2024 · Big Data

Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap

This article presents a comprehensive overview of the Apache‑incubated Paimon project, covering its evolution from Flink Table Store, the current features of primary‑key and log tables, management tools such as snapshots, tags and branches, performance optimizations for Flink and Spark, and a detailed roadmap of upcoming functionalities.

LakehousePaimonReal-time OLAP
0 likes · 23 min read
Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap
DataFunSummit
DataFunSummit
Apr 12, 2024 · Artificial Intelligence

Exploring the Application of AI Large Models in the Automotive Industry

This article provides a comprehensive overview of AI large‑model development, defines what constitutes a large model, discusses current challenges such as cost, privacy and safety, and examines how these models can improve efficiency across automotive marketing, sales, service, data management, infrastructure building, and future automation stages.

AIAutomotiveEfficiency
0 likes · 13 min read
Exploring the Application of AI Large Models in the Automotive Industry
DevOps
DevOps
Jan 17, 2024 · Operations

Agile Data Management: Principles, Practices, and Implementation Guide

This article explains how agile methodologies can be applied to data management, covering the need for agile data practices, core principles, iterative modeling, governance, CI/CD pipelines, tooling, metrics, security, case studies, challenges, and future outlooks in a comprehensive, step‑by‑step guide.

AgileDataOpsdata governance
0 likes · 13 min read
Agile Data Management: Principles, Practices, and Implementation Guide
DataFunSummit
DataFunSummit
Dec 20, 2023 · Cloud Native

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

This article introduces the background, challenges, and cloud‑native solutions of lakehouse architecture, explains Apache Iceberg’s open table format and its cloud‑native features, details Amoro’s management and self‑optimizing capabilities, showcases three real‑world cloud migration cases, and outlines future development plans.

AmoroApache IcebergLakehouse
0 likes · 12 min read
Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro
Architects Research Society
Architects Research Society
Nov 10, 2023 · Big Data

Enterprise Data Strategy Driven by Business Outcomes in the Zettabyte Era

The article explains how the explosive growth to the zettabyte scale reshapes enterprise data strategy, emphasizing business‑driven value, big‑data management practices, and integrated processes that turn massive data into actionable insights for competitive advantage.

Digital Transformationbig databusiness outcomes
0 likes · 9 min read
Enterprise Data Strategy Driven by Business Outcomes in the Zettabyte Era