Tagged articles
18 articles
Page 1 of 1
Kuaishou Tech
Kuaishou Tech
Apr 2, 2025 · Big Data

Apache Hudi Asia Summit Successfully Held

The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.

Apache HudiBig DataData Lake
0 likes · 12 min read
Apache Hudi Asia Summit Successfully Held
DaTaobao Tech
DaTaobao Tech
Mar 7, 2025 · Artificial Intelligence

Taobao Content AI: Summary of AIGC Content Generation and Multimodal Model Techniques

Taobao’s AIGC pipeline combines a human‑feedback multimodal reward model, audio‑visual joint pre‑training, and Mixture‑of‑Experts distillation to clean data, align outputs with user preferences, and achieve state‑of‑the‑art multimodal LLM performance that drives content cold‑start and conversion gains in e‑commerce.

AIGCContent GenerationReward model
0 likes · 10 min read
Taobao Content AI: Summary of AIGC Content Generation and Multimodal Model Techniques
DataFunSummit
DataFunSummit
Jan 14, 2025 · Big Data

Tencent Real-Time Lakehouse Intelligent Optimization Practice

This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.

Auto OptimizeBig DataFlink
0 likes · 12 min read
Tencent Real-Time Lakehouse Intelligent Optimization Practice
DataFunSummit
DataFunSummit
Jan 3, 2025 · Big Data

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.

Big DataFlinkIceberg
0 likes · 11 min read
Tencent Real‑Time Lakehouse Intelligent Optimization Practices
DataFunSummit
DataFunSummit
Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineFlinkIceberg
0 likes · 11 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 25, 2024 · Big Data

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

This article presents Tencent's real‑time lakehouse architecture, detailing its three‑layer design of compute, management and storage, and explains the six components of the Intelligent Optimization Service—including Compaction, Index, Clustering, and AutoEngine—along with scenario‑based capabilities, migration strategies, and future optimization directions.

Big DataReal-time analyticsTencent
0 likes · 11 min read
Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
May 5, 2023 · Big Data

Strategies for Handling Small Files in Hive and Spark

This article examines the causes and impacts of small file proliferation in Hive and Spark environments, and presents multiple mitigation techniques—including Spark 3 adaptive query execution, reducing reduce tasks, using DISTRIBUTE BY RAND(), post‑processing clean‑up, Hive and Spark configuration tweaks, and automated tooling—to improve performance and storage efficiency.

Big DataHiveSmall Files
0 likes · 9 min read
Strategies for Handling Small Files in Hive and Spark

Data Task Optimization Techniques and Practices

The article surveys unconventional offline data‑task optimizations—such as distribution‑by, seeded random shuffling, explode‑based skew mitigation, hash bucketing, task‑parallelism tuning, and multi‑insert materialization—organized by point, line, and surface perspectives, and stresses that effective performance gains require both technical tricks and business‑driven pipeline adjustments.

HiveSQL Tuningdata optimization
0 likes · 16 min read
Data Task Optimization Techniques and Practices
dbaplus Community
dbaplus Community
Oct 2, 2022 · Backend Development

Cutting Invalid Data: How Zhaunzhuan Optimized Its Product Service for 3× Faster Performance

This article examines how Zhaunzhuan's product service, a core component of its e‑commerce platform, was optimized by reducing unnecessary data transmission, applying cache‑aside patterns, redesigning Redis storage, and introducing a field‑marking approach, resulting in dramatically lower GC overhead, network traffic, and response times.

GraphQLbitmaskdata optimization
0 likes · 14 min read
Cutting Invalid Data: How Zhaunzhuan Optimized Its Product Service for 3× Faster Performance
DataFunTalk
DataFunTalk
Aug 1, 2022 · Big Data

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.

AlluxioBitmap IndexIceberg
0 likes · 21 min read
Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices
Zhuanzhuan Tech
Zhuanzhuan Tech
Apr 27, 2022 · Backend Development

Optimizing Product Service Performance through Data Reduction and Field Selection

This article examines performance bottlenecks in a high‑traffic e‑commerce product service and proposes data‑centric optimizations—including read‑only focus, field‑level selection via bit‑masking, and Redis hash storage—to reduce payload size, lower GC pressure, and improve latency while maintaining scalability.

Backenddata optimizationfield selection
0 likes · 14 min read
Optimizing Product Service Performance through Data Reduction and Field Selection
JD.com Experience Design Center
JD.com Experience Design Center
Sep 17, 2020 · Product Management

How JD’s Ranking Page Supercharged 618 Sales: Design, Data, and Efficiency Hacks

This case study details how JD.com’s ranking page for the 618 promotion leveraged data‑driven navigation, visual redesign, pixel‑perfect front‑end techniques, and component‑based development to boost click‑through rates, conversion, and overall GMV while outlining future optimization directions.

UI designdata optimizatione‑commerce
0 likes · 12 min read
How JD’s Ranking Page Supercharged 618 Sales: Design, Data, and Efficiency Hacks
dbaplus Community
dbaplus Community
Sep 15, 2020 · Big Data

How Didi Doubled Elasticsearch Write Throughput and Cut Server Costs

Didi’s engineering team analyzed a severe write bottleneck in their 3000‑node Elasticsearch cluster, identified long‑tail latency caused by refresh, translog locks, write queues and GC, and applied routing‑aware bulk writes, JVM and Lucene tweaks, and data cleaning to more than double write throughput while slashing server costs.

DidiElasticsearchLong Tail
0 likes · 17 min read
How Didi Doubled Elasticsearch Write Throughput and Cut Server Costs
Zhengtong Technical Team
Zhengtong Technical Team
Dec 20, 2019 · Big Data

Optimizing Trajectory Visualization: From Data Collection to Rendering

This article examines the challenges of mobile‑based trajectory tracking in city management and presents a comprehensive set of optimizations—including adaptive GPS sampling, keep‑alive strategies, accuracy enhancements, algorithmic fitting, and cinematic animation effects—to produce smooth, accurate, and visually appealing trajectory displays at scale.

GPSKalman FilterTrajectory
0 likes · 11 min read
Optimizing Trajectory Visualization: From Data Collection to Rendering
Baobao Algorithm Notes
Baobao Algorithm Notes
May 8, 2018 · Industry Insights

Cracking the TalkingData Ad Fraud Kaggle Challenge: Tips, Pitfalls & CV Strategies

This article details a data‑science team’s end‑to‑end approach to the TalkingData ad‑fraud Kaggle competition, covering dataset quirks, performance‑critical optimizations, a multi‑stage cross‑validation workflow, feature‑engineering tactics, model experiments with LightGBM and neural nets, and key lessons learned.

KaggleLightGBMad fraud detection
0 likes · 11 min read
Cracking the TalkingData Ad Fraud Kaggle Challenge: Tips, Pitfalls & CV Strategies
21CTO
21CTO
Jan 12, 2016 · Cloud Computing

How Meizu Scales Cloud Sync: Protocols, Architecture & Data Handling

This article explains Meizu's cloud synchronization system, detailing its custom MZ‑SyncML, Semi‑Sync, File‑Sync and One‑Sync protocols, the multi‑IDC deployment, routing components, data format optimizations, and modular backend architecture that together support millions of users with high availability and efficient data transfer.

backend servicescloud syncdata optimization
0 likes · 14 min read
How Meizu Scales Cloud Sync: Protocols, Architecture & Data Handling