Fundamentals 11 min read

What Is Statistics? A Beginner’s Guide to Data Collection, Analysis, and Inference

This article introduces the fundamentals of statistics, covering its purpose, types, data collection methods, data organization steps, graphical representation, measures of central tendency and dispersion, probability concepts, parameter estimation, hypothesis testing, and the distinction between correlation and regression analysis.

Model Perspective
Model Perspective
Model Perspective
What Is Statistics? A Beginner’s Guide to Data Collection, Analysis, and Inference

What is Statistics

Statistics is the scientific methodology for collecting, organizing, describing, displaying and analyzing data, aiming to eliminate randomness and discover underlying quantitative regularities for scientific understanding of objective phenomena.

Based on research goals, statistics is divided into theoretical and applied statistics; based on the role of methods, into descriptive and inferential statistics.

Data Collection

Statistical data are mainly obtained through surveys, including censuses, sampling surveys, focused surveys, typical surveys, and statistical reporting systems.

Before conducting a survey, a survey plan must be developed, covering:

Determine the purpose and tasks of the survey;

Define the target population, units, scope and methods;

Design the questionnaire, main contents and standards;

Set the survey timing and data entry schedule;

Plan the organization and implementation of the survey work;

Data processing and quality control.

Statistical data processing organizes collected data into systematic, coherent information that reflects overall quantitative characteristics.

Steps of data processing include:

Design and compile a data aggregation plan;

Review the raw data collected;

Group raw data according to research objectives and analysis needs;

Perform summary calculations and create frequency distribution tables;

Prepare statistical tables;

Compile statistical data and systematically accumulate historical data.

Graphical Presentation

Statistical tables and charts are two ways to display data, summarizing basic features.

Qualitative data can be shown with frequency tables, pie charts, bar charts, and ring charts.

Quantitative data can also be displayed with histograms, line charts, scatter plots, stem-and-leaf plots, and box plots.

Descriptive Measures

After data are organized, tables or graphs reveal frequency distribution characteristics, but to uncover distribution patterns and essential features, one must study measures of central tendency and dispersion.

Central tendency indicates how data cluster around a central value, measured by arithmetic mean, harmonic mean, geometric mean, median, and mode.

Dispersion (or variability) reflects how values deviate from the center, measured by range, mean deviation, variance, standard deviation, coefficient of variation, and other indices.

Probability and Distributions

When a random experiment is repeated many times under identical conditions, the relative frequency of an event approaches a constant value, which is defined as the probability of the event.

Probability describes the likelihood of a single outcome in a trial; a full understanding requires knowledge of all possible outcomes and their probabilities, forming a probability distribution. Distributions are classified as discrete or continuous.

Discrete distributions have a finite or countably infinite set of possible values, such as binomial, Poisson, and hypergeometric distributions.

Continuous distributions have values that can take any point within an interval on the number line, including normal, chi-square, and other continuous distributions.

Parameter Estimation

Parameter estimation uses sample statistics to infer population parameters. Methods include point estimation and interval estimation.

Three criteria for good estimators are unbiasedness, efficiency, and consistency.

Unbiasedness: the expected difference between the estimator and the true parameter is zero.

Efficiency: among unbiased estimators, the one with the smallest variance is most efficient.

Consistency: as sample size grows indefinitely, the estimator converges to the true parameter.

Point estimation directly uses sample statistics to approximate population parameters, such as using the sample mean for the population mean.

Interval estimation determines a range that likely contains the population parameter based on a specified confidence level.

Confidence level (or confidence coefficient) is the probability that the interval captures the true parameter.

Hypothesis Testing

Hypothesis testing evaluates assumptions about unknown population parameters or distribution forms using sample information.

General steps:

Formulate null and alternative hypotheses;

Construct an appropriate test statistic;

Choose a significance level and corresponding critical value;

Calculate the test statistic;

Make a statistical decision and interpret the result.

Common types of tests include two-sided, right-tailed, and left-tailed tests.

Typical methods are t-tests for means and chi-square tests for variances, applicable to large or small samples, as well as goodness-of-fit tests such as the Jarque‑Bera test.

Correlation and Regression Analysis

Statistical study of quantitative relationships between phenomena involves correlation analysis and regression analysis. Correlation measures the strength and direction of a relationship without implying causality, while regression quantifies the functional dependence of a dependent variable on one or more independent variables.

Key differences:

Correlation treats variables symmetrically, whereas regression distinguishes independent and dependent variables.

Correlation assesses the degree of association; regression provides explicit equations linking variables.

Both variables must be random in correlation analysis, but only the dependent variable is random in regression analysis.

Reference

“Cartoon Statistics”, edited by Xie Hongguang; National Bureau of Statistics News Office, Shaanxi Provincial Statistics Bureau.

statisticsdata analysishypothesis testingregressionprobability
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.