Fundamentals 5 min read

Mastering Probability Distributions in R: From Normal to Poisson

This article explains how various continuous and discrete probability distributions—such as normal, binomial, Poisson, and negative binomial—are used in real‑world contexts, introduces R’s naming conventions for distribution functions, and provides code examples for computing densities, CDFs, quantiles, and random samples.

Model Perspective
Model Perspective
Model Perspective
Mastering Probability Distributions in R: From Normal to Poisson

Random variable measurements follow specific distributions; many continuous variables in life follow the normal distribution, such as human height or crop yield.

Discrete variables follow distributions like binomial or Poisson; for example, gene expression counts from RNA‑seq can be approximated by a Poisson distribution, while differential expression analysis often uses a negative binomial model.

Statistics includes many distributions such as normal, binomial, Poisson, chi‑square, etc. Knowing a distribution allows calculation of probability density, cumulative distribution function (CDF), quantiles, and generation of random samples; R provides corresponding functions. In R, function names consist of a prefix and a suffix: the prefix d, p, q, r indicates density, CDF, quantile, and random generation respectively, and the suffix denotes the distribution type (e.g., dnorm for normal).

Common distributions and their R functions include:

Normal (norm): dnorm, pnorm, qnorm, rnorm

Binomial (binom): dbinom, pbinom, qbinom, rbinom

Poisson (pois): dpois, ppois, qpois, rpois

Hypergeometric (hyper): dhyper, phyper, qhyper, rhyper

Chi‑square (chisq): dchisq, pchisq, qchisq, rchisq

t distribution (t): dt, pt, qt, rt

F distribution (f): df, pf, qf, rf

Negative binomial (nbinom): dnbinom, pnbinom, qnbinom, rnbinom

Exponential (exp): dexp, pexp, qexp, rexp

Example: A random variable follows a standard normal distribution; compute the quantile for a CDF of 0.95.
<code>qnorm(0.95)
# 1.64485362695147</code>

The syntax of qnorm() is qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) . In the example the default parameters mean = 0, sd = 1, lower.tail = TRUE are used. The parameter mean denotes the distribution mean, sd the standard deviation, and lower.tail indicates whether to compute the lower‑tail quantile.

Example: In a region with one million people, about 40 per 100,000 are diagnosed with lung cancer each year. Assuming new cases follow a Poisson distribution with λ = 40, what is the probability that fewer than 200 cases occur in a year?
<code>ppois(20, 40)</code>

The syntax of ppois() is ppois(q, lambda, lower.tail = TRUE, log.p = FALSE) . Here λ = 40 represents the expected number of new lung‑cancer diagnoses per 100,000 people per year; using the Poisson CDF yields the probability that annual new cases are below 200.

Source: Liu Hongde, Sun Xiao, Xie Jianming, “Bioinformatics Data Analysis and Practice”.

statisticsdata analysisprobabilityRdistributions
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.