Journal of Rehabilitation Medicine 51-3

FIM™ internal construct validity revisited Testlet approaches Within the traditional testlet approach 3 different ver- sions of testlet combinations were applied, based on the underlying subscale structure of the FIM™. Two ver- sions included 4 testlets for the motor scale, structured according to the FIM™ subtopics (self-care, sphincter control, transfers, locomotion) together with 2 combi- nations of the cognitive items. In one version all the cognitive FIM™ items were unified in one testlet, since they all showed local dependency among each other at the baseline analysis, resulting in a total of 5 testlets. In the other version, the cognitive items were split thematically according to the FIM™ subtopics into 2 testlets, communication and social cognition, resulting in a total of 6 testlets. The third version attempted to form similar sized testlets and was oriented at the residual correlations between the items and formerly reported clusters of the FIM™ (29, 36). In this version, 3 testlets were created: a self-care testlet incorporating items A–H, a mobility testlet incorporating items I–L, and a cognitive testlet incorporating items M–R. None of the 3 traditional testlet approaches, the 3-testlet, the 5-testlet and the 6-testlet version, resulted in fit to the Rasch model (see Table II). In contrast, the alternative 2-testlet approach (with Testlet1 containing items A, C, E, G, I, K, M, O and Q, and Testlet2 containing items B, D, F, H, J, L, N, P and R) showed fit to the Rasch model across all 9 ana- lyses steps. The p-values from the item-trait χ 2 were all non-significant at the 0.01 level, the reliability indexes all above 0.9, and the item- and person-fit estimates within the set acceptable values. The expected common variance values retained in the latent estimate were all just above 1, indicating some marginal remaining residual local dependency among the testlets. The fit of all testlet solutions is summarized in Table II, and the application of the 2-testlet approach to all aggre- gation levels of the calibration sample is shown in Appendix S2 1 . Differential Item Functioning strategy Despite overall fit, some DIF remained in the 2-testlet solution for the whole calibration sample. For elimi- nating all DIF, the successful 2-testlet solution of the whole calibration sample had to be split twice. Testlet2 first had to be split by rehabilitation group. Secondly, the group of musculoskeletal rehabilitation from Test- let2 had to be split into the 2 time-points, i.e. admission and discharge. This resulted in the following super- items: Testlet1, Testlet2_NEUR, Testlet2_MSKt1, and Testlet2_MSKt2. Testlet1 was the anchor for the comparison of the person estimates of the split and the unsplit version. The effect size calculation resulted in 0.11 (see Appendix S3 1 ), indicating that there was no need to split the final interval-scale transformation into different subgroups. Transformation table Based on the 2-testlet solution, an interval-based transformation table was created for all available FIM™ total scores, which can be used to transfer the Table II. Testlet solutions on the level of the whole calibration sample (FIM_all) Item-fit residuals Mean (SD) Person-fit residuals Mean (SD) χ 2 p-value PSI n/CI Testlets (items) 946/10 6 Testlets: –0.156 (5.077) –0.426 (1.200) 0.000 Self-Care (A-F), Sphincter Control (G-H), Transfers (I-K), Locomotion (L-M), Communication (N-O), Social Cognition (P-R) 946/10 5 Testlets: –0.010 (7.046) –0.360 (1.138) 0.000 Self-Care (A-F), Sphincter Control (G-H), Transfers (I-K), Locomotion (L-M), Cognition (N-R) 946/10 3 Testlets: Self-Care (A-H), Mobility (I-M), Cognition (N-R) 946/10 197 –1.419 (6.894) –0.502 (1.049) 0.000 2-testlets: –0.208 (0.317) –0.614 (1.003) 0.408 Testlet1 (A, C, E, G, I, K, M, O, Q), Testlet2 (B, D, F, H, J, L, N, P, R) Acceptable values SD < 1.4 SD < 1.4 > 0.01 α DIF (Testlet) A Paired Cond. test of t-test, % fit CI based 0.906 0.887 gender (T6), age (T2), language (T1, T2, T3, T4, T5, T6), insurance (T1, T5), time-point (T2, T4, T5), rehab-group (T1, T3, T4, T5, T6) 0.895 0.878 age (T2, T5), language (T1, T2, T3, T4, T5), nationality (T5), insurance (T1), time-point (T2, T4), rehab-group (T3, T4, T5) 0.838 0.859 gender (T2, T3), age (T1), language (T1, T2, T3), nationality (T1, T3), insurance (T1), time-point (T2), rehab-group (T2, T3) 0.980 0.981 rehab-group (T1, T2) 1.019 4.97 0.607 > 0.7 > 0.9 > 0.01 > 0.7 No DIF 0.942 1.27 Only available for the 2-testlet approach 0.930 1.16 0.871 1.27 < 5.00 FIM all: Functional Independence Measure; all: combination of time-points and rehabilitation-groups; n: sample size; CI: class intervals; SD: standard deviation; PSI: person separation index; α: Cronbach’s alpha; A: explained common variance; DIF: differential item functioning. J Rehabil Med 51, 2019

Journal of Rehabilitation Medicine 51-3 | Page 51