Example 4: Construct validation of two ability tests

Accuracy of the correlation in a two-dimensional 2PL model

Author

Anonymous

Published

July 19, 2024


Objective

The example describes the examination of the relationship between two achievement tests: A 20-item math test and a 20-item reading test in secondary school. Both tests are scored dichotomously (true/false) with a correlation of \(\rho = .70\) between the skills. The parameter of interest is the standard error of the estimated correlation between math and reading. The test design features to be varied are the sample size and if there is a speededness effect present. In this example, we will use a speededness effect that will induce some missingness in the last third of the items.

I. Determining the data generation for the complete dataset


  1. Number and distribution of factors (unidimensional vs. multidimensional)
  2. Number of items and item parameters (discriminations, difficulties)
  3. Item type (dichotomous, polytomous)

True item parameters are generated for the 30 dichotomous items on the two dimensions (math and reading).

# Clear work space and load necessary packages
rm(list = ls())
library(tidyverse)   # General data handling and manipulation
library(mirt)        # Simulate data from a multivariate normal distribution
library(mice)        # Induce missingness at random to the data

# Set the seed for random number generation to ensure reproducibility
set.seed(2024)

# Number of items
n_items <- 30

# Discrimination parameters for both dimensionas (a_i1, a_i2) are randomly 
  # drawn from a log-normal as it reflects the natural variation and 
  # positive skew typically observed in empirical data.
a_i1 <- c(rlnorm(n_items/2,.2,.3), rep(NA, n_items/2))
a_i2 <- c(rep(NA, n_items/2), rlnorm(n_items/2,.2,.3))
a_i <- cbind(a_i1, a_i2) %>% as.matrix()

# Difficulty parameters (b_i) are randomly drawn from a normal distribution
b_i <- rnorm(n_items, 0, 1)

The conditional probability that a person \(p\) with ability parameters on both dimensions, \(\theta_{p1}\) and \(\theta_{p2}\), answers the item \(i\) correctly is given by the following formula:

\[ P(X_{pi} = 1 | a_{i1}, a_{i2}, b_{i}, \theta_{p1}, \theta_{p2}) = \frac{1}{1 + \exp (-(a_{i1} \theta_{p1} + a_{i2} \theta_{p2} - b_i))} \]

where \(a_{i1}\) and \(a_{i2}\) denote the discrimination parameters for the two dimensions, and \(b_i\) the difficulty parameter of item \(i\).

The function generate_data_mv uses mirt::simdata to simulate dichotomous data given the item discriminations (\(a\)), the item difficulties (\(b\)), and the number of observations (\(n\)). The responses are generated according to a multivariate normal distribution with a predefined covariance (or correlation) matrix (\(\Sigma\)) and a mean vector (\(\mu\)). Note that in the mirt package, the transformation \(d = -a*b\) is used to match the slope-intercept parameterization of the logistic model.

# Simulate dichotomous item responses based on the following parameters
  # - 'a' denotes the item discriminations
  # - 'b' denotes and the item difficulties
  # - 'n' denotes the number of observations
# The remaining arguments are preset
  # - 'sigma' denotes the covariance matrix 
  # - 'mu' denotes the mean vector
  # - 'itemtype' specifies the type of item (i.e., dichotomous)

generate_data_mv <- function(a, b, n) {
  resp <- 
    simdata(a = a, d = -a*b, N = n, 
            mu = c(0, 0), 
            sigma = matrix(c(1, 0.7, 0.7, 1), nrow = 2),
            itemtype = "dich") %>% 
    as.data.frame()
  
  return(resp)
}

II. Defining the test design and the process of missing values


  1. Pattern of missingness (e.g., MCAR, linking design)
  2. Amount of missing data

Each achievement test consists of 15 items. There is a speededness effect for the last 5 items for each of the separately timed tests. Thus, later items in both tests are less likely to be answered (response rate item 11: ~80%, items 12-13: ~.70%, and items 14-15: ~40%). In the present case, the probability of missing data depends on the observed data only, which is equivalent to Missing At Random (MAR).

The function data_MAR_design uses the complete simulated data and deletes values using mice::ampute. For more information on the multivariate amputation procedure see Schouten et al. (2016) and the accompanying vignette.

# Induce missingness at random (MAR) to the complete simulated data set
  # - 'resp' denotes the complete data set
  # - 'prop' denotes the severity of missingness 
  #   (has to be seen in combination with missingness pattern)
# The remaining arguments are preset
  # - 'patterns' denotes a matrix defining a specific missing data patterns
  # - 'mech' denotes the mechanism of missing data
  # - 'type' denotes the type of missingness, 'RIGHT' means that later 
  #    items more likely to be missing

data_MAR_design <- function(resp, miss_rates = 0) {
  
  # create some pattern of missingness for the first test (only the last 5 items)
  adp <- ampute.default.patterns(5)
  pattern <- rbind(
  cbind(matrix(1, nrow =5, ncol = 10), adp),
      c(1,1,1,1,1,1,1,1,1,1,1,1,1,0,0),
      c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0),
      c(1,1,1,1,1,1,1,1,1,1,1,0,1,0,0),
      c(1,1,1,1,1,1,1,1,1,1,0,1,1,0,0),
      c(1,1,1,1,1,1,1,1,1,1,1,0,0,0,0))
  
  # Resample rows of patterns of missingness for the second test
  pattern_res1 <- pattern[sample(10, 10, replace = FALSE), ]
  pattern_res2 <- pattern[sample(10, 10, replace = FALSE), ]

  # Append the resampled pattern
  pattern_miss <- unique(rbind(cbind(pattern, pattern_res1),
                               cbind(pattern, pattern_res2)))

  # Apply ampute to create missing data under MAR
  dat_amp <- ampute(data = resp,
                    prop = miss_rates,          # severity of missingness
                    patterns = pattern_miss,    # custom patterns of missingness
                    mech = "MAR",               # missing mechanism
                    type = "RIGHT")             # missingness type
  resp <- dat_amp$amp
  # md.pattern(resp)

  return(resp)
}

III. Selecting the IRT model and the parameter of interest


  1. Underlying IRT model (e.g., 1PL, 2PL)
  2. IRT modeling software and estimation method
  3. Parameters to extract

The estimate_irt function estimates the multidimensional two-parametric item response model (Birnbaum model) with mirt using the simulated data and returns the standard error of the correlation between the latent traits.

# Estimate item response model
  # - 'resp' denotes the data set

estimate_irt <- function(resp) {
  model <- paste('math = 1-', n_items/2, "\n",
                 'read = ', n_items/2+1, '-', n_items, "\n",
                 'COV = math * read ')

# Fit the two-dimensional model using mirt
  mod <- mirt(resp, model, itemtype = '2PL', SE = TRUE)

  # Calculate standard errors for the correlation if model is successfully estimated
  if (!is.null(mod)) {
    est <- coef(mod, printSE = TRUE)$GroupPars["SE", "COV_21"]    # Extract the standard error of the correlation
  } else {
    est <- NA
  }
  
  return(est)
}

IV. Setting up the Monte Carlo simulation


  1. Number of iterations
  2. Sample sizes to evaluate

The Monte Carlo simulation runs n_iterations times, including the previous steps of (i) determining the data generation for the complete dataset, (ii) defining the test design and the process of missing values, (iii) selecting the IRT model and the parameter of interest. The simulation is run for different sample sizes between 100 and 500 (in increments of 50), with and without the speededness effect.

Using an estimated variance for the standard error of the correlation (\(\sigma = 0.05\)), a specified level of accuracy (\(\delta = .05\)), and a significance level (\(\alpha = .05\)), the number of iterations needed is approximately 77.

# Number of iterations
n_iterations <- 77

# Sample sizes (from 100 to 500, in steps of 50) and 
  # the presence of a speededness factor (yes/no)
grid <- expand.grid(n_persons = seq(100, 500, 50), 
                    miss_rates = c(1, 0))

# Create data frame for results (res)
res <- data.frame()

# Loop over different sample sizes and missing rate combinations
for (g in 1:nrow(grid)) {
  
  # Nested loop, running 'n_iterations' times
  for (iter in 1:n_iterations) {
    # Generate a random sample with 'n_persons' from a multivariate distribution
    dat <- 
      generate_data_mv(a_i, b_i, grid[g, "n_persons"]) %>% 
      data_MAR_design(miss_rates = grid[g, "miss_rates"])

    res <- bind_rows(res, 
                     data.frame(iteration = iter,
                                n_persons = grid[g, "n_persons"],
                                percent_miss = grid[g, "miss_rates"],
                                se_est = estimate_irt(dat)))
  }
}

Results

The figure shows the average standard error for the correlation between the latent traits as a function of sample size on the one hand, and the presence or absence of a speededness effect on the other.

# Preparation and aggregation of results
res_plot <- res %>%
  group_by(n_persons, percent_miss) %>%
  summarise(
    m_SE = mean(se_est^2, na.rm = TRUE)^.5,          # Calculate the average SE
    .groups = 'drop')


# Plot standard error of correlation
ggplot(data=res_plot, aes(x=n_persons, y = m_SE, color = factor(percent_miss))) +
  geom_line(linewidth = 0.8, alpha = 0.8) +
  geom_point(size = 1.5, alpha = 0.8) +
  geom_abline(intercept = .05, slope = 0, col="red", lty = "twodash") +
  labs(
    x = "Sample size",
    y = "Average SE(Correlation)",
    color = "Speededness\neffect",
    linetype = "Speededness\neffect"
  ) +
  scale_color_manual(values = c("1" = "aquamarine3", 
                                "0" = "darkolivegreen2"),
                     labels = c("1" = "yes",
                                "0" = "no")) +
  
  ylim(0, 0.12) +
  xlim(100, 500) +
  theme_bw()  +
  theme(
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 12),
    legend.position = "inside", 
    legend.position.inside =  c(.85, .80)
  )

# Documentation for transparency and reproducibility
print(sessionInfo(), locale=FALSE)
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.9.0

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.16.0     mirt_1.41       lattice_0.22-5  lubridate_1.9.3
 [5] forcats_1.0.0   stringr_1.5.0   dplyr_1.1.4     purrr_1.0.2    
 [9] readr_2.1.5     tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.0  
[13] tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0     farver_2.1.1         fastmap_1.1.0       
 [4] digest_0.6.30        rpart_4.1.23         timechange_0.3.0    
 [7] lifecycle_1.0.3      Deriv_4.1.3          dcurver_0.9.2       
[10] cluster_2.1.6        survival_3.5-8       magrittr_2.0.3      
[13] compiler_4.3.3       rlang_1.1.3          tools_4.3.3         
[16] utf8_1.2.2           yaml_2.3.6           knitr_1.41          
[19] labeling_0.4.3       htmlwidgets_1.5.4    withr_2.5.0         
[22] nnet_7.3-19          grid_4.3.3           fansi_1.0.3         
[25] jomo_2.7-6           colorspace_2.1-0     GPArotation_2024.3-1
[28] scales_1.3.0         iterators_1.0.14     MASS_7.3-60.0.1     
[31] cli_3.6.2            rmarkdown_2.18       vegan_2.6-4         
[34] generics_0.1.3       rstudioapi_0.14      tzdb_0.4.0          
[37] minqa_1.2.7          pbapply_1.6-0        splines_4.3.3       
[40] parallel_4.3.3       vctrs_0.6.5          boot_1.3-30         
[43] glmnet_4.1-8         Matrix_1.6-5         jsonlite_1.8.4      
[46] hms_1.1.3            mitml_0.4-5          foreach_1.5.2       
[49] glue_1.6.2           nloptr_2.1.1         pan_1.9             
[52] codetools_0.2-19     stringi_1.7.8        gtable_0.3.4        
[55] shape_1.4.6.1        lme4_1.1-35.5        munsell_0.5.0       
[58] pillar_1.9.0         htmltools_0.5.4      R6_2.5.1            
[61] evaluate_0.18        backports_1.4.1      broom_1.0.5         
[64] Rcpp_1.0.9           gridExtra_2.3        nlme_3.1-163        
[67] permute_0.9-7        mgcv_1.9-1           xfun_0.35           
[70] pkgconfig_2.0.3