
Figural matrices tests are common in intelligence research and have been used to draw conclusions regarding secular changes in intelligence. However, their measurement properties have seldom been evaluated with large samples that include both sexes. Using data from the Norwegian Armed Forces, we study the measurement properties of a test used for selection in military recruitment. Item-level data were available from 113,671 Norwegian adolescents (32% female) tested between the years 2011 and 2017. Utilizing item response theory (IRT), we characterize the measurement properties of the test in terms of difficulty, discrimination, precision, and measurement invariance between males and females. We estimate sex differences in the mean and variance of the latent variable and evaluate the impact of violations to measurement invariance on the estimated distribution parameters. The results show that unidimensional IRT models fit well in all groups and years. There is little difference in precision and test difficulty between males and females, with precision that is generally poor on the upper part of the scale. In the sample, male latent proficiency is estimated to be slightly higher on average, with higher variance. Adjusting for measurement invariance generally reduces the sex differences but does not eliminate them. We conclude that previous studies using the Norwegian GMA data must be interpreted with more caution but that the test should measure males and females equally fairly.
measurement invariance, Social sciences (General), H1-99, figural matrices, sex bias, item response theory, measurement precision, Article, fluid intelligence
measurement invariance, Social sciences (General), H1-99, figural matrices, sex bias, item response theory, measurement precision, Article, fluid intelligence
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
