Fundamentals 17 min read

Application of IRT Theory and Techniques in Large‑Scale International Assessments: The Case of PISA

This article explains how Item Response Theory (IRT) models and linking techniques are employed in the Programme for International Student Assessment (PISA) to ensure cross‑national and longitudinal comparability of large‑scale educational measurements, illustrating design, matrix sampling, and model specifics such as 2PL and GPCM.

TAL Education Technology

Jun 24, 2020

Application of IRT Theory and Techniques in Large‑Scale International Assessments: The Case of PISA

Continuing from the previous article on IRT theory, this piece explains how IRT is applied in large‑scale assessments, using the Programme for International Student Assessment (PISA) as an example.

PISA, launched by the OECD in 2000 and administered every three years, assesses 15‑year‑old students in reading, mathematics and science to compare basic education across countries and cultures. It emphasizes literacy and the ability to apply knowledge to real‑world problems rather than selection.

The exam design for cross‑national comparability focuses on two aspects: (1) content that probes underlying subject literacy, allowing comparable performance across diverse curricula, and (2) a sampling design that uses matrix sampling to reduce test length while preserving measurement precision.

Matrix sampling divides the full item pool into several parallel, smaller booklets; each student receives one booklet, which reduces the number of items each student must answer while still providing accurate ability estimates for the whole population.

IRT linking techniques are employed for both horizontal (within‑year) and vertical (between‑year) comparisons. Horizontal linking uses a balanced incomplete block design (BIB) and concurrent calibration to place scores from different booklets on a common scale. Vertical linking relies on anchor items (trend items) across years and also uses concurrent calibration to link ability scales over time.

PISA employs two IRT models: the two‑parameter logistic (2PL) model for dichotomous items and the generalized partial credit model (GPCM) for polytomous items. The 2PL model incorporates item difficulty and discrimination parameters, while the GPCM extends this to multiple scoring categories, allowing partial credit for intermediate responses.

Illustrative item characteristic curves show how items with higher discrimination (steeper slopes) better differentiate examinees of varying ability.

The GPCM models the probability of achieving each score category based on ability, item discrimination, and threshold parameters, producing multiple category characteristic curves.

By applying these IRT models and linking procedures, PISA can generate comparable scores across countries and years, supporting educational policy decisions and research.

Conclusion: IRT theory and techniques are widely used in large‑scale assessments such as PISA, TOEFL, and national exams, enabling fair cross‑national and longitudinal comparisons and providing more valid evidence for educational decision‑making.

References:

PISA 2006 Released Items – Mathematics: https://www.oecd.org/pisa/38709418.pdf

PISA 2012 Released Mathematics Items: http://www.oecd.org/pisa/pisaproducts/pisa2012-2006-rel-items-maths-ENG.pdf

PISA 2018 Technical Report: https://www.oecd.org/pisa/data/pisa2018technicalreport/

PISA 2018 Results: https://www.oecd.org/pisa/publications/pisa-2018-results.htm

Muraki, E. (2007). A generalized partial credit model. In Van der Linden, W. J., & Hambleton, R. K. (Eds.), Handbook of Modern Item Response Theory. New York: Springer.

Wu, M., Tam H. P., Jen, T‑H. (2016). Educational Measurement for Applied Researchers. Springer, Singapore.

李凌艳, 张平平. 大规模教育测评中实际运用矩阵取样技术的基本问题. 中国考试, 2011(1):16‑21.

王烨晖, 张缨斌, 杨涛, 辛涛. 国际大型测评项目中等值技术的应用与启示. 中国考试, 2017(8):43‑49.

袁建林, 刘红云. 国际大规模教育评价的经验与趋势——以PISA为例. 中小学信息技术教育, 2016(7):20‑23.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

2PL model Educational Measurement GPCM IRT item response theory large-scale assessment PISA

Written by

TAL Education Technology

TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.