Date of Completion


Embargo Period



vertical scaling, multidimensionality, construct shift, calibration

Major Advisor

Dr. H. Jane Rogers

Associate Advisor

Dr. Hariharan Swaminathan

Associate Advisor

Dr. D. Betsy McCoach

Associate Advisor

Dr. Jessica Goldstein

Field of Study

Educational Psychology


Doctor of Philosophy

Open Access

Open Access


The primary purpose of this study was to examine the extent to which violations of item response model dimensionality assumptions, model misspecification, and choice of calibration procedure affect accuracy of item and person parameter estimates and the estimation of growth in an IRT vertical scaling application using mixed-format tests. The assumptions of unidimensionality within grade and construct invariance across grades was of primary interest, as they may not hold in a vertical scaling context. Real data from a statewide assessment spanning six grades and two subject areas were analyzed to investigate the presence of construct shift and explore issues of model-data fit. In addition, two simulation studies were conducted to investigate how well different calibration procedures were able to recover the vertically scaled item and person parameters in the presence and absence of construct invariance and model misspecification. Data were generated using parameter estimates obtained from the analysis of the real data. A bifactor model was used to model construct shift across grades. Three calibration procedures – full concurrent, paired concurrent, and fixed theta – were compared with respect to recovery of item and person parameter values on the vertical scale. Recovery of group and individual growth was examined using the parameter estimates obtained using each procedure under each simulation condition. Results showed that the full concurrent and paired concurrent calibration procedures were able to adequately measure growth across six grades when the model fitted the data. Model misspecification and construct shift resulted in overestimation of growth. Effects were greater for the simulated Mathematics data than for the Reading data.