Scoring and estimating score precision using multidimensional IRT

Part of book or chapter of book English OPEN
Brown, Anna ; Croudace, Tim J (2015)
  • Publisher: Taylor & Francis (Routledge)
  • Subject: BF

The ultimate goal of measurement is to produce a score by which individuals can be assessed and differentiated. Item response theory (IRT) modeling views responses to test items as indicators of a respondent’s standing on some underlying psychological attributes (van der Linden & Hambleton, 1997) – we often call them latent traits – and devises special algorithms for estimating this standing. This chapter gives an overview of methods for estimating person attribute scores using one-dimensional and multi-dimensional IRT models, focusing on those that are particularly useful with patient-reported outcome (PRO) measures. \ud To be useful in applications, a test score has to approximate the latent trait well, and importantly, the precision level must be known in order to produce information for decision-making purposes. Unlike classical test theory (CTT), which assumes the precision with which a test measures the same for all trait levels, IRT methods assess the precision with which a test measures at different trait levels. In the context of patient-reported outcomes measurement, this enables assessment of the measurement precision for an individual patient. Knowing error bands around the patient’s score is important for informing clinical judgments, such as deciding upon significance of any change, for instance in response to treatment etc. (Reise & Haviland, 2005). At the same time, summary indices are often needed to summarize the overall precision of measurement in a research sample, population group, or in the population as a whole. Much of this chapter is devoted to methods for estimating measurement precision, including the score-dependent standard error of measurement and appropriate sample-level or population-level marginal reliability coefficients.\ud Patient-reported outcome measures often capture several related constructs, the feature that may make the use of multi-dimensional IRT models appropriate and beneficial (Gibbons, Immekus & Bock, 2007). Several such models are described, including a model with multiple correlated constructs, a model where multiple constructs are underlain by a general common factor (second-order model), and a model where each item is influenced by one general and one group factor (bifactor model). To make the use of these models more easily accessible for applied researchers, we provide specialized formulae for computing test information, standard errors and reliability. We show how to translate a multitude of numbers and graphs conditioned on several dimensions into easy-to-use indices that can be understood by applied researchers and test users alike. All described methods and techniques are illustrated with a single data analysis example involving a popular PRO measure, the 28-item version of the General Health Questionnaire (GHQ28; Goldberg & Williams, 1988), completed in mid-life by a large community sample as a part of a major UK cohort study.
  • References (30)
    30 references, page 1 of 3

    Ackerman, T.A. (2005). Multidimensional item response theory modeling. In A. MaydeuOlivares & J. J. McArdle. (Eds.).Contemporary Psychometrics (pp. 3-26). Mahwah, NJ: Lawrence Erlbaum.

    Bock, R.D. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill.

    Bock, R.D., Gibbons, R., Schilling, S.G., Muraki, E., Wilson, D.T., & Wood, R. (2003). TESTFACT 4.0 user's guide. Chicago, IL: Scientific Software International.

    Gibbons, R.D., Bock, R.D., Hedeker, D., Weiss, D.J., Segawa, E., Bhaumik, D.K., Kupfer, D.J., Frank, E., Grochocinski, V.J. & Stover, A. (2007). Full-Information Item Bifactor Analysis of Graded Response Data. Applied Psychological Measurement, 31, 4–19.

    Gibbons, R.D., Immekus, J.C. & Bock, R.D. (2007). The Added Value of Multidimensional IRT Models. Didactic workbook. Retrieved on 1 June 2011 from

    Brown, A. & Maydeu-Olivares, A. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71(3), 460-502.

    Croudace, T.J., Evans, J., Harrsion, G., Sharp, D.J., Wilkinson, E., McCann, G., Spence, M., Crilly, C. & Brindle, L. (2003). Impact of the ICD-10 Primary Health Care (PHC) diagnostic and management guidelines for mental disorders on detection and outcome in primary care: cluster randomised controlled trial. British Journal of Psychiatry, 182, 20-30.

    Dodd, B.G., De Ayala, R.J. & Koch, W.R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Methods, 19, 5-22.

    Du Toit, M. (Ed.). (2003). IRT from SSI. Chicago: Scientific Software International.

    Embretson, S. E. & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum Publishers.

  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Kent Academic Repository - IRUS-UK 0 381
Share - Bookmark