Comprehensive Interview Question Cheat Sheet for Top Tech Companies
This article compiles a detailed list of interview question topics from leading tech firms—including search, algorithm engineering, NLP, multimodal LLMs, advertising, recommendation, risk control, and big‑data domains—covering algorithms, system design, machine‑learning concepts, and practical coding challenges.
Feizhu – Search Algorithms
Hash addressing algorithm
Overview of shortest‑path algorithms
Detecting cycles in graphs
Probability problem: given 99% prediction accuracy and 0.3% true‑positive rate, compute P(true positive | positive)
Scenario 1: Modeling the entity of a current query using current and historical queries with their entities
Scenario 2: Language identification for similar languages (e.g., Malay vs. English)
Scenario 3: Query rewriting baseline evaluation and its impact (e.g., matching “Beida” vs. “Beijing University” to hotels)
Scenario 4: Error correction and similar‑word modeling
Unclear scenario, but pleasant interview discussion
Baidu – Algorithm Engineering
C++ smart pointers
Python multiprocessing and multithreading
Garbage‑collection mechanisms
SQL transactions
Principles of LoRA
Explanation of Gradient‑Boosted Decision Trees (GBDT)
Typical architecture for translation tasks
Differences among encoder‑only, decoder‑only, and encoder‑decoder models
Transformer architecture overview
FlashAttention explanation
Differences between FP32 and FP16; fundamentals of mixed‑precision training
Beam search principle vs. direct sampling
Improvements for large models
Common frameworks and hardware used in practice
Python coroutines
Resource sharing between processes and threads
Program memory space and stack
Why Docker is useful and how to create containers
Linux process monitoring, termination, and real‑time file viewing
C++ virtual functions
Python Flask basics
Python Global Interpreter Lock (GIL)
Further details on FlashAttention
When large models require pre‑training
Differences among mainstream large models
Probability problem: two shooters with equal hit probability (0.5), shooter B has one extra attempt; compute chance B scores higher
TAL Education – NLP
How to initialize LoRA matrices and why zero‑initialization is used
Purpose of the past_key_value cache in GPT source code
Input‑output flow per layer in a one‑by‑one GPT implementation
Handling sparse output distributions with spikes
Decision‑tree fundamentals and how to perform regression with trees
Meaning of top‑p (nucleus) sampling in GPT
KL‑divergence formula and its difference from cross‑entropy
Typical inputs for reinforcement learning
Three‑stage construction of ChatGPT’s reward model
CART tree splitting criteria
Problem: Find a duplicate number in an array
Similarity measures beyond cosine similarity
Text embedding techniques
TF‑IDF formula
Scenario 1: Multi‑turn teacher‑student dialogue (audio transcription) – removing irrelevant utterances such as greetings
Scenario 2: Recommending practice questions to students while avoiding previously solved similar items
Hikvision – Multimodal LLMs
Tokenization handling for large models and vocabulary expansion
Design rationale behind Python’s multiprocessing versus multithreading (no true parallel threads)
New PyTorch parallel batch‑normalization
Verbal algorithm for generating perfect squares
Choosing and combining models for ten different modalities
Various CLIP variants
Common tricks that are not widely known
Techniques for handling imbalanced data
Differences between separate modality encoders and CLIP‑style joint encoding
Tencent – Advertising Algorithms
Problem: Compute the intersection of two lists with minimal time complexity, without using maps or sets
Problem: Find the maximum number in a list
NER models beyond GP and advantages of GP over standard NER
Addressing NER prediction errors, e.g., mislabeling “BMW 3‑Series” as B‑I‑B‑I
Definition of linear separability; is logistic regression linear or non‑linear?
Common click‑through‑rate (CTR) models
Structure of the FM component in DeepFM
Handling a single positive‑unbounded feature for binary classification
Overview of typical NLP tasks
Zhihu – Search Algorithms
Project topics
Career‑planning considerations
Challenges encountered in projects
Problem: Find the minimum value in a rotated array
BERT attention mechanism
Optimizers used in deep learning
Common loss functions
Feasibility of starting an internship immediately
Xiaopi – NLP
Find start and end indices of a target substring within a source string, ignoring spaces in the target but counting them in the source (KMP‑style problem)
Awareness of state‑of‑the‑art multimodal multi‑stream models
BERT architecture and associated loss
GPT architecture overview
Understanding of various NER models
Prompt engineering for large models across different tasks
Clustering iPhone Pro Max products without any labels
CITIC Bank HQ – Big Data
Summarize personal technology stack
Difference between DELETE and TRUNCATE in SQL
SQL transaction concepts
Architecture of recommendation systems
Differences between classification and regression
Purpose of activation functions and why non‑linearity is needed
Definition of loss
Basic knowledge of Hadoop
Explanation of RDD (Resilient Distributed Dataset)
Dewu – Recommendation (Possible)
O(n log n) sorting algorithms
Heap sort explanation
Dynamic programming fundamentals
Differences between XGBoost and GBDT
Pros and cons of LoRA
Batch normalization overview
Differences between Random Forest and GBDT
Game‑theory problem: 100 coins, players A and B can take 1‑2 coins per turn; determine A’s winning strategy
Tongcheng Travel – Risk Control
F‑score metric
Why AUC is not always used as an evaluation metric
Methods for handling class‑imbalance problems
Principles of batch normalization
Purpose of 1×1 convolution kernels
Characteristics, advantages, and disadvantages of ReLU and Sigmoid activations
When an internship can be started
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
