75 Essential Data Science Terms Every Practitioner Must Know
This article compiles a comprehensive alphabetically ordered list of 75 crucial data science and machine learning terms—from accuracy and AUC to zero-shot learning—providing concise definitions that help practitioners quickly grasp essential concepts and improve their analytical vocabulary.
Data science has a rich vocabulary. This list presents the 75 most common and important terms that data scientists use daily.
A
Accuracy : Measures the proportion of correct predictions among total predictions.
Area Under Curve (AUC) : Represents the area under the Receiver Operating Characteristic (ROC) curve, used to evaluate classification models.
ARIMA : A time series forecasting method.
B
Bias : The difference between the true value and the predicted value in a statistical model.
Bayes Theorem : A probability formula that calculates the likelihood of an event based on prior knowledge.
Binomial Distribution : A probability distribution modeling the number of successes in a fixed number of independent Bernoulli trials.
C
Clustering : Grouping data points based on similarity.
Confusion Matrix : A table used to evaluate the performance of classification models.
Cross-validation : A technique that assesses model performance by dividing data into subsets for training and testing.
D
Decision Trees : Tree‑structured models used for classification and regression tasks.
Dimensionality Reduction : The process of reducing the number of features in a dataset while retaining important information.
Discriminative Models : Models that learn boundaries between different classes.
E
Ensemble Learning : Techniques that combine multiple models to improve predictive performance.
EDA (Exploratory Data Analysis) : The process of analyzing and visualizing data to understand its patterns and attributes.
Entropy : A measure of uncertainty or randomness in information.
F
Feature Engineering : The process of creating new features from existing data to improve model performance.
F-score : A metric that balances precision and recall for binary classification.
Feature Extraction : Automatically extracting meaningful features from data.
G
Gradient Descent : An optimization algorithm that iteratively adjusts parameters to minimize a function.
Gaussian Distribution : The normal distribution with a bell‑shaped probability density function.
Gradient Boosting : An ensemble learning method that builds multiple weak learners sequentially.
H
Hypothesis : A testable statement or assumption in statistical inference.
Hierarchical Clustering : A clustering method that organizes data into a tree‑like structure.
Heteroscedasticity : Unequal variance of errors in a regression model.
I
Information Gain : A metric used in decision trees to determine feature importance.
Independent Variable : A variable that is manipulated in an experiment to observe its effect on the dependent variable.
Imbalance : A situation where class distribution in a dataset is uneven.
J
Jupyter : An interactive computing environment for data analysis and machine learning.
Joint Probability : The probability of two or more events occurring simultaneously.
Jaccard Index : A similarity measure between two sets.
K
Kernel Density Estimation : A non‑parametric method for estimating the probability density function of a continuous random variable.
KS Test (Kolmogorov‑Smirnov Test) : A non‑parametric test that compares two probability distributions.
KMeans Clustering : Divides data into K clusters based on similarity.
L
Likelihood : The probability of observing the data given a specific model.
Linear Regression : A statistical method for modeling the relationship between a dependent variable and one or more independent variables.
L1/L2 Regularization : Techniques that add penalty terms to a model’s loss function to prevent overfitting.
M
Maximum Likelihood Estimation : A method for estimating the parameters of a statistical model.
Multicollinearity : A situation where two or more independent variables in a regression model are highly correlated.
Mutual Information : A measure of the amount of information shared between two variables.
N
Naive Bayes : A probabilistic classifier based on Bayes’ theorem that assumes feature independence.
Normalization : Scaling data to a specified range.
O
Overfitting : When a model performs well on training data but poorly on unseen data.
Outliers : Data points that are markedly different from the rest of the dataset.
One-hot encoding : Converting categorical variables into binary vectors.
P
PCA (Principal Component Analysis) : A dimensionality‑reduction technique that transforms data into orthogonal components.
Precision : The proportion of true positive predictions among all positive predictions in a classification model.
p-value : The probability of observing results at least as extreme as those obtained, assuming the null hypothesis is true.
Q
QQ-plot (Quantile‑Quantile Plot) : A graphical tool for comparing the distributions of two datasets.
QR decomposition : Decomposes a matrix into an orthogonal matrix and an upper‑triangular matrix.
R
Random Forest : An ensemble learning method that uses multiple decision trees for prediction.
Recall : The proportion of true positive predictions among all actual positive instances in a classification model.
ROC Curve : A chart that displays the performance of a binary classifier at various threshold settings.
S
SVM (Support Vector Machine) : A supervised machine‑learning algorithm used for classification and regression.
Standardisation : Scaling data to have a mean of 0 and a standard deviation of 1.
Sampling : The process of selecting a subset of data points from a larger dataset.
T
t-SNE (t‑Distributed Stochastic Neighbor Embedding) : A dimensionality‑reduction technique for visualizing high‑dimensional data in lower dimensions.
t-distribution : A probability distribution used in hypothesis testing for small sample sizes.
Type I/II Error : In hypothesis testing, a Type I error is a false positive, and a Type II error is a false negative.
U
Underfitting : When a model is too simple to capture the underlying patterns in the data.
UMAP (Uniform Manifold Approximation and Projection) : A dimensionality‑reduction technique for visualizing high‑dimensional data.
Uniform Distribution : A probability distribution where all outcomes are equally likely.
V
Variance : A measure of how data points spread around the mean.
Validation Curve : A chart that shows how model performance varies with different hyperparameter values.
Vanishing Gradient : A problem in deep neural networks where gradients become extremely small during training.
W
Word embedding : Representing words as dense vectors in natural language processing.
Word cloud : A visual representation of text data where word frequency is indicated by size.
Weights : Parameters learned by a machine‑learning model during training.
X
XGBoost : Extreme Gradient Boosting, a popular gradient‑boosting library.
XLNet : Generalized Autoregressive Pretraining of Transformers, a language model.
Y
YOLO (You Only Look Once) : A real‑time object detection system.
Yellowbrick : A Python library for visualizing and diagnosing machine‑learning models.
Z
Z-score : A standardized value indicating how many standard deviations a data point is from the mean.
Z-test : A statistical test used to compare a sample mean to a known population mean.
Zero-shot learning : A machine‑learning approach where a model can recognize new categories without having seen explicit examples during training.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.