Bias and Standard Errors of Vertically Scaled Tests

Date of Completion

January 2011


Education, Tests and Measurements




The relationship between total error, bias, and standard error of vertically scaled tests were examined in two simulated conditions -- an ideal Item Response Theory (IRT) fit condition and a condition of IRT model misfit which was intended to approximate the type of misfit observed in operational data. Analytical estimates of standard error using an IRT information function and empirical estimates of standard error from a bootstrap re-sampling method were compared. A sufficient number of bootstrap re-samples required to yield the same degree of accuracy as 2000 re-samples was explored. ^ Analytical estimates of standard error were found to over-estimate standard error of vertical scale scores. The bootstrap method yielded more accurate estimates of the standard error as evidenced by the width of the confidence intervals for the proficiency levels and the coverage probabilities. Finally, a bootstrap resampling level of around 1000 was found to estimate standard errors with similar precision as compared to using 2000 re-samples. ^