Comprehensive Big Data Interview Experience and Questions Overview
The article presents a detailed three‑month interview journey that led to a position at a top new‑energy automotive firm, outlining the questions and topics covered in five interview rounds—including Hive, Spark, Flink, Kafka, data modeling, and data governance—to help candidates prepare for big‑data roles.
This article shares a former classmate's interview experience after three months of interviewing, culminating in a successful offer from a leading new‑energy automotive company. It details the questions and topics addressed across five interview rounds, providing valuable insights for candidates targeting big‑data positions.
Round 1
1.介绍项目,项目中的重点难点
2.hive的优化,这个好几家公司都问了
3.hive sql的执行计
4.hive和mysql的区别
5.Sort by 和order by的区别
6.数据倾斜的场景,如何解决的
7.sql题
字段:订单id,时间,用户id
计算10分钟内连续下单大于100次的用户Round 2
1.介绍项目,项目中的重点难点
2.数仓建模理论
3.冷热数据如何处理
4.数据治理从哪几个方面进行
5.数据质量的衡量标准,数据质量的效果,如何验收,项目流程
6.用的星型还是雪花模型,区别是什么?Round 3
1.介绍项目,项目中的重点难点
2.linux命令 查找文件,awk命令
3.kafka分区,ack机制
4.spark的执行原理
5.解析下spark的DAG
6.mr的执行原理
7.大小表join的优化
8.Spark常用算子reduceByKey与groupByKey的区别,哪一种更具优势?
9.Spark任务执行模式,提交任务,资源也够的情况下,还是不能跑,啥原因
10.spark和MR的区别Round 4
1.介绍项目,项目中的重点难点
2.项目中遇到啥问题
3.kafka丢失数据,怎么解决
4.kafka的核心组件介绍 topic,broker,partition,consumer,producer
5.clickhouse的各类引擎,怎么用的,啥原理,你们咋用的
6.Flink checkpoint执行流程
7.flink和spark 对比Round 5
1.介绍项目,项目中的重点难点
2.数据中台oneid,oneservice
3.遇到啥问题,项目进度把控,资源协调
4.数据的安全,权限的管理
5.数仓重构,数仓模型的建设,遇到啥问题,什么样的周期,如何安排的,效率咋样Overall Impression
The market has fewer positions than the previous year, yet the candidate received multiple interview invitations. Core topics included data warehouse modeling, real‑time and batch frameworks, project experience, data quality, and governance, with technical focus on Flink, Spark, Kafka, and Hive.
Soft skills such as risk control, resource coordination, and cross‑department communication were also emphasized.
If this article helped you, please remember to "watch", "like", and "bookmark".
Related resources:
2022 Big Data Expert Skill Model and Learning Guide (Shengtian Banzi Edition)
The Internet's Worst Era May Have Arrived
I Study Big Data at Bilibili
What Are We Actually Learning When Studying Flink?
193 Articles on Flink – A Must‑Read Collection
Flink Production Environment Top Challenges and Optimizations (Alibaba)
Flink CDC Online Issue Summary
What Are We Actually Learning When Studying Spark?
Why SparkSQL Is the Strongest Spark Module
Hard‑Core Hive: 40k‑Word Optimization Interview Summary
Data Governance Methodology and Practice Encyclopedia
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
