Big Data 7 min read

Big Data Interview Preparation: Data Governance, Iceberg Metadata, Lakehouse Best Practices, and Xiaohongshu HR Updates

The article reports Xiaohongshu’s cancellation of the big‑small week schedule and non‑compete clause, then provides a collection of big‑data interview questions—including data governance, Iceberg metadata management, and lakehouse production best practices—along with concise answers and resources for candidates.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Interview Preparation: Data Governance, Iceberg Metadata, Lakehouse Best Practices, and Xiaohongshu HR Updates

On April 24, Xiaohongshu announced internally that it will cancel the "big‑small week" schedule and the non‑compete agreement, both effective from May 1.

The big‑small week system has existed in many internet companies during their early stages, such as Dewu, Kuaishou, and ByteDance.

Some employees are reluctant to cancel the system because the workload remains the same while overtime pay disappears, reducing overall earnings.

However, Xiaohongshu is one of the few internet companies that explicitly cancels the non‑compete clause, which is a significant benefit for workers.

The internal letter states:

我们希望点亮一盏小灯,向环境发出不一样的信号——不再通过竞业限制个体流动。
仅要求同学履行信息保密和不招揽义务,与大家建立更长期的关系。

Below is the full internal letter (image).

Xiaohongshu’s unique business also creates strong demand for data development positions. Here are interview questions recorded from a big‑data advanced class candidate’s interview at Xiaohongshu, with selected core questions answered.

一面

1. 维度表建模方法论,了解哪些可以说说?
3. 数据治理有没有最佳实践?
4. flink常用窗口
5. flink如何保证数据的有序
6. 如何判断一张表设计的是否合理?你们的评审流程

二面&三面

1. spark任务提交参数
2. spark内部有哪几种join
3. hive数据倾斜优化
4. 数据质量如何保证
5. iceberg用过吗?元数据管理和清理
6. iceberg的文件构成是怎样的?
7. 湖框架在生产环境有哪些最佳实践?可以谈一谈吗

We select a few questions to answer.

1. Data governance best practices?

Data governance includes three core aspects: cost, quality, and efficiency. Explain how you optimize costs, monitor quality, and improve efficiency, providing concrete examples such as task identification, large‑task governance, and using platform features like shuffle optimization or small‑file reduction, supported by quantitative metrics.

2. Have you used Iceberg? Metadata management and cleanup.

Iceberg’s metadata management is layered: table metadata files, snapshots, manifest files, and data files. Each layer stores specific information, and versioning tracks table history, enabling time‑travel queries and consistent, recoverable data. Metadata can be stored in Hive Metastore, object storage (S3, GCS), or HDFS.

Over time, many snapshots accumulate; they can be cleaned up by scheduled deletion. Additionally, small files increase metadata overhead; periodic file rewriting merges small files into larger ones, with Iceberg’s rewrite operation supporting configurable merge strategies and automatic merging when conditions are met.

3. What are the best practices for lake frameworks in production?

Best‑practice considerations include:

Data model: table type selection, partitioning, bucketing, etc.

Read/write optimization: read‑side and write‑side tuning.

Operations and monitoring: common issues, solutions, core metrics.

Benefits: cost reduction, development efficiency improvements.

Alright, brothers can go for Xiaohongshu now!

Finally, you are welcome to join our knowledge‑sharing community:

"300万字!全网最全大数据学习面试社区等你来" .

If this article helped you, don’t forget to "watch", "like", and "collect" – the three‑click combo!

Data GovernanceIcebergXiaohongshulakehouse
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.