Tag

Data Engineering

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
May 27, 2025 · Artificial Intelligence

Integrating Data and AI for Platform Engineering: IDP Practices, Model Fine‑Tuning, and R&D Efficiency at Qunhe Technology

The article details how Qunhe Technology combines big data and AI within an Internal Developer Product (IDP) framework to boost software development efficiency, outlines architectural decisions, presents fine‑tuning pipelines for code‑review models, and shares interview insights from senior technical director Dr. Hu Guanghuan on practical implementations and ROI.

AIData EngineeringR&D efficiency
0 likes · 22 min read
Integrating Data and AI for Platform Engineering: IDP Practices, Model Fine‑Tuning, and R&D Efficiency at Qunhe Technology
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 26, 2025 · Big Data

Comparative Guide to Apache Airflow and Argo Workflows for Distributed Task Scheduling

This article provides a comprehensive comparison of Apache Airflow and Argo Workflows, covering their core features, architectures, use cases, code examples, and recommendations for selecting the appropriate distributed workflow engine in data engineering, big‑data, and AI pipelines.

Apache AirflowArgo WorkflowsBig Data
0 likes · 23 min read
Comparative Guide to Apache Airflow and Argo Workflows for Distributed Task Scheduling
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 20, 2025 · Big Data

Why Learn Kafka? Core Benefits, Use Cases, and a Summary

This article explains why Kafka is widely adopted by top companies, outlines its high throughput, scalability, and durability, and describes key real‑time data pipeline, stream processing, and big‑data integration scenarios, concluding that mastering Kafka is essential for modern backend and data engineering roles.

Big DataData EngineeringKafka
0 likes · 4 min read
Why Learn Kafka? Core Benefits, Use Cases, and a Summary
DevOps Engineer
DevOps Engineer
Apr 25, 2025 · Big Data

Reflections on PyCon LT 2025 Data Day: Sessions on Static Code Analysis, Data Warehouses, Pipelines, and Data Science Tools

The author recounts attending PyCon LT 2025 Data Day, summarizing talks on building a simple static code analyzer with AST, challenges of data warehouses versus data lakes, cloud cost‑scraping pipelines, A/B testing libraries, privacy‑enhancing data processing, and tools like Panel and Dagster, while noting the inspiring presence of female speakers.

DagsterData EngineeringPanel
0 likes · 7 min read
Reflections on PyCon LT 2025 Data Day: Sessions on Static Code Analysis, Data Warehouses, Pipelines, and Data Science Tools
Kuaishou Tech
Kuaishou Tech
Apr 2, 2025 · Big Data

Apache Hudi Asia Summit Successfully Held

The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.

Apache HudiBig DataData Engineering
0 likes · 12 min read
Apache Hudi Asia Summit Successfully Held
Big Data Technology Architecture
Big Data Technology Architecture
Feb 8, 2025 · Big Data

How AI Can Accelerate Data Engineering: Practical DeepSeek Use Cases and Tips

This article shows how AI tools like DeepSeek can dramatically speed up data‑engineering tasks—such as fixing long‑running SQL queries, building real‑time data pipelines with Flink, and deciphering legacy stored procedures—while offering concrete prompts, real‑world case studies, and five time‑saving techniques.

AIAutomationBig Data
0 likes · 6 min read
How AI Can Accelerate Data Engineering: Practical DeepSeek Use Cases and Tips
DataFunSummit
DataFunSummit
Feb 5, 2025 · Artificial Intelligence

Exploration and Practice of Large‑Model Data Construction

This presentation details engineering‑focused approaches to building, mixing, and filtering data for large language models, covering data preparation, pre‑training mix strategies such as DoReMi, DoGE and online sampling, post‑training data quality selection methods, and practical Q&A on scaling laws and PDF processing.

AIData EngineeringLarge Language Models
0 likes · 15 min read
Exploration and Practice of Large‑Model Data Construction
JD Tech
JD Tech
Dec 30, 2024 · Big Data

Techniques for Writing Elegant and Efficient SQL in Big Data Environments

The article shares practical methods and code examples for making SQL both readable and high‑performing in large‑scale data platforms, covering predicate push‑down with subqueries, deduplication strategies, bucket utilization, and Python‑driven job parameter handling.

Big DataData EngineeringHive
0 likes · 14 min read
Techniques for Writing Elegant and Efficient SQL in Big Data Environments
Python Programming Learning Circle
Python Programming Learning Circle
Dec 6, 2024 · Artificial Intelligence

24 Essential Python Libraries for an End‑to‑End Data Science Workflow

This article introduces 24 highly useful Python libraries that cover the entire data‑science lifecycle—from data collection and cleaning to visualization, modeling, interpretation, and deployment—helping readers build a comprehensive and visually appealing data‑analysis pipeline.

Data EngineeringLibrariesPython
0 likes · 3 min read
24 Essential Python Libraries for an End‑to‑End Data Science Workflow
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 5, 2024 · Big Data

Interview with Jianchen: Journey from Open Source Contributor to Data Engineer at Xiaohongshu

In this interview, Xiaohongshu data engineer Jianchen recounts his evolution from a computer‑science student discovering open‑source through MIT6.824 to contributing to SOFAJRaft and Apache RocketMQ, detailing his OSPP projects, the decision to join Xiaohongshu, and his work on a cloud‑native Kafka engine that cut storage and compute usage by half.

Apache RocketMQBig DataData Engineering
0 likes · 11 min read
Interview with Jianchen: Journey from Open Source Contributor to Data Engineer at Xiaohongshu
DataFunSummit
DataFunSummit
Dec 5, 2024 · Big Data

Ping An Financial Services' Big Data Platform Construction and Data Governance Practices

This article details Ping An Financial Services' journey in building a comprehensive big‑data platform, addressing fragmentation, low data timeliness, processing limits, and governance challenges through a four‑stage technical evolution, modular tool development, and a systematic data‑governance framework to support its digital transformation.

Big DataData EngineeringData Governance
0 likes · 16 min read
Ping An Financial Services' Big Data Platform Construction and Data Governance Practices
DataFunSummit
DataFunSummit
Dec 1, 2024 · Big Data

Data Weaving for AB Experiment Automation: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of JD Retail's data‑weaving approach to AB experiment automation, detailing the challenges of consistency, scientific rigor, and timeliness, the logical data platform architecture, key technologies, metric modeling, automated DAG orchestration, current progress, and future directions.

AB testingAutomationBig Data
0 likes · 21 min read
Data Weaving for AB Experiment Automation: Architecture, Challenges, and Solutions
ByteDance Data Platform
ByteDance Data Platform
Nov 6, 2024 · Big Data

How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges

This article explains how Douyin Group tackles massive data volume, quality, and efficiency issues by building a four‑layer intelligent platform, standardizing metric management, automating metric decomposition, and creating reusable metric services that boost agility, stability, and cross‑team collaboration.

Big DataData EngineeringData Platform
0 likes · 20 min read
How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges
DataFunSummit
DataFunSummit
Oct 29, 2024 · Artificial Intelligence

Technical Maturity Curve of User Profiling and Tag Systems in the Large‑Model Era

This article explains the concept of a technology maturity curve, why it should be evaluated, and how user profiling and tag systems evolve under the influence of large‑model AI, detailing seven key assessment dimensions and a comprehensive architecture that guides enterprises in strategic decision‑making.

AIData EngineeringLarge Models
0 likes · 21 min read
Technical Maturity Curve of User Profiling and Tag Systems in the Large‑Model Era
Bilibili Tech
Bilibili Tech
Oct 25, 2024 · Big Data

DataFunSummit2024: Next-Generation Data Architecture Technology Summit

DataFunSummit2024, co-hosted by Bilibili, convenes industry experts, scholars, and enterprise leaders across six forums to discuss next‑generation data architecture, showcasing Bilibili’s Iceberg‑based stream‑batch innovations, AI‑BI analytics, NoETL practices, and emerging alternatives to Lambda architecture.

AI+BIBig DataData Architecture
0 likes · 3 min read
DataFunSummit2024: Next-Generation Data Architecture Technology Summit
DataFunSummit
DataFunSummit
Oct 11, 2024 · Big Data

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

Kuaishou’s data‑lake initiative tackled exploding offline warehouse costs, redundant model proliferation, and data‑consistency complexities by adopting Apache Hudi’s schema‑evolution capabilities and real‑time lake ingestion, improving cross‑team collaboration and narrowing the real‑time‑offline data gap.

Apache HudiBig DataData Engineering
0 likes · 6 min read
Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi
AntData
AntData
Sep 26, 2024 · Artificial Intelligence

DB-GPT: Open-Source AI-Native Data Application Development Framework

DB‑GPT is an open‑source AI‑native data‑application framework that provides multi‑model management, Text‑to‑SQL optimization, RAG, multi‑agent collaboration, and intelligent workflow orchestration, enabling developers to build scalable large‑model database applications, with proven enterprise adoption, community growth, and academic publications.

AIData EngineeringLarge Language Models
0 likes · 6 min read
DB-GPT: Open-Source AI-Native Data Application Development Framework
JD Retail Technology
JD Retail Technology
Sep 25, 2024 · Big Data

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

The article narrates the author’s 30‑year programming career and shares technical reflections on building business‑agnostic, configurable data platforms, covering batch, streaming, interactive computing, big‑data sharding, Spark, Flink, cloud migration, and the philosophy of software architecture.

Big DataData Engineeringbatch processing
0 likes · 23 min read
From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design
DataFunTalk
DataFunTalk
Sep 19, 2024 · Databases

Technical Topics Overview from DataFun Summit: Graph Database, Vector Database, Real-time Data Warehouse, and Cloud‑Native Solutions

The article presents a collection of technical overviews—including a graph database for distributed queries, a next‑generation vector database, real‑time data warehouse architectures at Douyin and Ant Group, a cloud‑native ClickHouse service, and best practices for financial data warehousing—while also explaining how to obtain the related e‑book.

Big DataCloud NativeData Engineering
0 likes · 4 min read
Technical Topics Overview from DataFun Summit: Graph Database, Vector Database, Real-time Data Warehouse, and Cloud‑Native Solutions
AntTech
AntTech
Sep 10, 2024 · Big Data

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

The talk reviews the rapid evolution of data technologies—from early database foundations and big‑data breakthroughs to the rise of generative AI—highlighting how Ant Group’s data platform is shifting from a cost‑efficiency focus to a value‑centric, multimodal, AI‑driven ecosystem.

Artificial IntelligenceBig DataData Engineering
0 likes · 17 min read
From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System