Big Data 30 min read

Comprehensive Collection of Technical Interview Questions from Major Tech Companies

This article compiles a wide range of interview questions sourced from Glassdoor covering general topics, machine learning, statistics, programming, big‑data frameworks, SQL, and brain‑teasers, providing candidates with English translations and insights into the types of problems asked by companies such as Apple, Google, Microsoft, Uber, and many others.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Comprehensive Collection of Technical Interview Questions from Major Tech Companies

General Questions

Apple

Suppose you’re given millions of users each with hundreds of transactions across tens of thousands of products. How would you group the users into meaningful segments?

Microsoft

Describe a project you’ve worked on and how it made a difference.

How would you approach a categorical feature with high‑cardinality?

What would you do to summarize a Twitter feed?

What are the steps for wrangling and cleaning data before applying machine learning algorithms?

How do you measure distance between data points?

Define variance.

Describe the differences between and use cases for box plots and histograms.

Twitter

What features would you use to build a recommendation algorithm for users?

Uber

Pick any product or app that you really like and describe how you would improve it.

How would you find an anomaly in a distribution?

How would you investigate if a certain trend in a distribution is due to an anomaly?

How would you estimate the impact Uber has on traffic and driving conditions?

What metrics would you consider using to track if Uber’s paid advertising strategy to acquire new customers actually works? How would you then approach figuring out an ideal customer acquisition cost?

LinkedIn

Big Data Engineer – Can you explain what REST is?

Machine Learning Questions

Google

Why do you use feature selection?

What is the effect on the coefficients of logistic regression if two predictors are highly correlated? What are the confidence intervals of the coefficients?

What’s the difference between Gaussian Mixture Model and K‑Means?

How do you pick k for K‑Means?

How do you know when Gaussian Mixture Model is applicable?

Assuming a clustering model’s labels are known, how do you evaluate the performance of the model?

Microsoft

What’s an example of a machine learning project you’re proud of?

Choose any machine learning algorithm and describe it.

Describe how Gradient Boosting works.

Data Mining – Describe the decision tree model.

Data Mining – What is a neural network?

Explain the Bias‑Variance Tradeoff.

How do you deal with unbalanced binary classification?

What’s the difference between L1 and L2 regularization?

Uber

What sort of features could you give an Uber driver to predict if they will accept a ride request? Which supervised learning algorithm would you use and how would you compare the results?

LinkedIn

Name and describe three different kernel functions and in what situation you would use each.

Describe a method used in machine learning.

How do you deal with sparse data?

IBM

How do you prevent overfitting?

How do you deal with outliers in your data?

How do you analyze the performance of the predictions generated by regression models versus classification models?

How do you assess logistic regression versus simple linear regression models?

What’s the difference between supervised learning and unsupervised learning?

What is cross‑validation and why would you use it?

What’s the name of the matrix used to evaluate predictive models?

What relationships exist between a logistic regression’s coefficient and the Odds Ratio?

What’s the relationship between Principal Component Analysis (PCA) and Linear & Quadratic Discriminant Analysis (LDA & QDA)?

If you had a categorical dependent variable and a mixture of categorical and continuous independent variables, what algorithms, methods, or tools would you use for analysis?

Business Analytics – What’s the difference between logistic and linear regression? How do you avoid local minima?

Salesforce

What data and models would you use to measure attrition/churn? How would you measure the performance of your models?

Explain a machine learning algorithm as if you’re talking to a non‑technical person.

Capital One

How would you build a model to predict credit card fraud?

How do you handle missing or bad data?

How would you derive new features from features that already exist?

If you’re attempting to predict a customer’s gender, and you only have 100 data points, what problems could arise?

Suppose you were given two years of transaction history. What features would you use to predict credit risk?

Design an AI program for Tic‑tac‑toe.

Zillow

Explain overfitting and what steps you can take to prevent it.

Why does SVM need to maximize the margin between support vectors?

Hadoop

Twitter

How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?

Data Engineer – Given a list of followers in the format: ID of follower, ID of followee, find all mutual following pairs. How would you use Map/Reduce when the list does not fit in memory?

Capital One

Data Engineer – What is Hadoop serialization?

Explain a simple Map/Reduce problem.

Hive

LinkedIn

Data Engineer – Write a Hive UDF that returns a sentiment score (e.g., good = 1, bad = -1, average = 0).

Spark

Capital One

Data Engineer – Explain how RDDs work with Scala in Spark.

Statistics & Probability Questions

Google

Explain Cross‑validation as if you’re talking to a non‑technical person.

Describe a non‑normal probability distribution and how to apply it.

Microsoft

Data Mining – Explain what heteroskedasticity is and how to solve it.

Twitter

Given Twitter user data, how would you measure engagement?

Uber

What are some different Time Series forecasting techniques?

Explain Principal Component Analysis (PCA) and the equations PCA uses.

How do you solve Multicollinearity?

Analyst – Write an equation that would optimize the ad spend between Twitter and Facebook.

Facebook

What’s the probability you’ll draw two cards of the same suit from a single deck?

IBM

What are p‑values and confidence intervals?

Capital One

Data Analyst – If you have 70 red marbles and the ratio of green to red marbles is 2 to 7, how many green marbles are there?

What would the distribution of daily commutes in New York City look like?

Given a die, would it be more likely to get a single 6 in six rolls, at least two 6s in twelve rolls, or at least one‑hundred 6s in six‑hundred rolls?

PayPal

What’s the Central Limit Theorem, and how do you prove it? What are its applications?

Programming & Algorithms

Google

Data Analyst – Write a program that can determine the height of an arbitrary binary tree.

Microsoft

Create a function that checks if a word is a palindrome.

Twitter

Build a power set.

How do you find the median of a very large dataset?

Uber

Data Engineer – Code a function that calculates the square root (2‑point precision) of a given number and then optimize it with a caching mechanism.

Facebook

Write a function that adds two binary strings without using built‑in conversion tools and discuss its space and time complexity.

Write a function that accepts two already sorted lists and returns their union in a sorted list.

LinkedIn

Data Engineer – Write code to determine if brackets in a string are balanced.

How do you find the second largest element in a Binary Search Tree?

Write a function that takes two sorted vectors and returns a single sorted vector.

If you have an incoming stream of numbers, how would you find the most frequent numbers on‑the‑fly?

Write a function that raises one number to another number (pow function).

Split a large string into valid words and store them in a dictionary; if impossible, return false. Discuss solution complexity.

Salesforce

What’s the computational complexity of finding a document’s most frequently used words?

If you’re given 10 TB of unstructured customer data, how would you extract valuable information from it?

Capital One

Data Engineer – How would you ‘disjoin’ two arrays (the opposite of SQL JOIN)?

Create a function that does addition where the numbers are represented as two linked lists.

Create a function that calculates matrix sums.

How would you use Python to read a very large tab‑delimited file of numbers to count the frequency of each number?

PayPal

Write a function that takes a sentence and prints the same sentence with each word reversed in O(n) time.

Write a function that takes an array, splits it into every possible set of two arrays, and prints the max differences between the two arrays’ minima in O(n) time.

Write a program that performs merge sort.

SQL Questions

Microsoft

Data Analyst – Define and explain the differences between clustered and non‑clustered indexes.

What are the different ways to return the rowcount of a table?

Facebook

Data Engineer – If you’re given a raw data table, how would you perform ETL (Extract, Transform, Load) with SQL to obtain the data in a desired format?

How would you write a SQL query to compute a frequency table of a certain attribute involving two joins? What changes would you need to make for ORDER BY, GROUP BY, and handling NULLs?

LinkedIn

Data Engineer – How would you improve ETL (Extract, Transform, Load) throughput?

Brain Teasers & Word Problems

Google

Suppose you have ten bags of marbles with ten marbles each. One bag weighs differently; you can only perform a single weighing. How would you determine the odd bag?

Facebook

You are about to fly to Seattle and want to know if you should carry an umbrella. You call three friends who each tell the truth 2/3 of the time. All say it’s raining. What is the probability that it is actually raining?

Uber

Imagine patients arrive at a hospital following a Poisson distribution and doctors attend to them uniformly. Write a function or code block that outputs the average patient wait time and total number of patients attended to on a random day.

Facebook

Imagine there are three ants at each corner of an equilateral triangle, each randomly picking a direction and traversing the edge. What’s the probability that none of the ants collide? Extend to N ants at N corners of an equilateral polygon.

How many trailing zeros are in 100 factorial (100!)?

LinkedIn

Imagine you’re climbing a staircase with n stairs and you can take any number k steps. How many distinct ways can you reach the top?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLprogrammingstatisticsinterview-questionsData Science
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.