Fundamentals 9 min read

Master the Basics of Statistics: Populations, Samples, and Descriptive Measures

This article introduces fundamental statistical concepts—including populations, samples, measures, parameters, descriptive and inferential statistics—explains the relationship between samples and populations, and details key descriptive metrics such as mean, variance, percentiles, box plots, violin plots, and z‑scores.

Model Perspective
Model Perspective
Model Perspective
Master the Basics of Statistics: Populations, Samples, and Descriptive Measures

1 Basic Concepts of Statistics

Population (Population) is any specific set of objects of interest. A sample (Sample) is any subset of the population.
Measure (Measure) is the numeric value calculated for each element of a population or sample. The collection of measures for sample elements is called sample data (Sample data).
Parameter (Parameter) represents a characteristic of the population. A statistic (Statistic) is a value computed from sample data.
Statistics is the discipline of collecting, displaying, analyzing, and drawing conclusions from data.
Descriptive statistics (Descriptive statistics) is a branch of statistics that involves organizing, displaying, and describing data.
Inferential statistics (Inferential statistics) is another branch that draws conclusions about a population based on sample information.
Quantitative data (Quantitative data) is expressed by numerical values.
Qualitative data (Qualitative data) consists of attributes, labels, or other non‑numeric characteristics.

2 Relationship Between Sample and Population

The relationship between a population and a sample drawn from it is the most important concept in statistics and underlies all other concepts. The diagram below illustrates this relationship:

The large circle represents all elements in the population. The solid black circles are randomly selected elements that together form the sample. Each sample element has a measurement denoted by lowercase x (x₁,…,xₙ); these measurements constitute the sample data set. From these data we compute various statistics such as the sample mean and sample proportion, which serve as approximations of the population mean μ and population proportion p.

For example, to estimate the average age of a city’s residents, we might randomly sample 1,000 citizens, record their ages, and use the sample mean as an estimate of the city‑wide average age.

3 Descriptive Statistics

3.1 Mean

The mean is defined as the sum of all values divided by the number of values. It measures the central tendency of a data set; other common measures of central tendency include the median and mode.

3.2 Sample Variance and Standard Deviation

Sample variance is defined as the average of the squared deviations from the sample mean (using n‑1 in the denominator). Sample standard deviation is the square root of the sample variance.

3.3 Population Variance and Standard Deviation

Population variance is defined as the average of the squared deviations from the population mean (using N in the denominator). Population standard deviation is the square root of the population variance.

Variance and standard deviation measure data variability; other related metrics include the range.

3.4 Percentiles, Quartiles, and Box Plots

A percentile indicates the value below which a given percentage of observations fall when the data are ordered.

A quartile divides the ordered data into four equal parts. The three quartile values are:

First quartile (Q1) – the 25th percentile.

Second quartile (Q2) – the median, or 50th percentile.

Third quartile (Q3) – the 75th percentile. The distance between Q3 and Q1 is the interquartile range (IQR).

The minimum and maximum values, together with Q1, Q2, and Q3, form the five‑number summary, which is used to construct a box plot.

The box plot displays each of the five numbers as vertical lines; the box spans Q1 to Q3, and “whiskers” extend to the minimum and maximum values.

A violin plot combines a box plot with a kernel density estimate, showing the probability density as a “violin” shape.

3.5 Z‑Score

The z‑score expresses how many standard deviations an observation x is from the mean of the data set. For sample data, the denominator is the sample standard deviation; for population data, it is the population standard deviation. A negative z‑score indicates a value below the mean, zero indicates it equals the mean, and a positive z‑score indicates it is above the mean.

4 Summary

This article introduced fundamental statistical concepts and common descriptive statistics and visualizations. When faced with a data set, start with descriptive statistics to gain an overall sense of the data before proceeding to deeper mathematical analysis.

References

https://baike.baidu.com/item/分位数/10064158?fr=aladdin

https://saylordotorg.github.io/text_introductory-statistics/s06-04-relative-position-of-data.html

statisticsvariancesamplepopulationstandard deviationdescriptive statistics
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.