项目来源
韩国国家研究基金(NRF)
项目主持人
이현숙
项目受资助机构
건국대학교
立项年度
2011
立项时间
未公开
项目编号
2011-332-B00376
项目级别
国家级
研究期限
未知 / 未知
受资助金额
未知
学科
사회과학
学科代码
未公开
基金类别
人文社科-SD-Research
关键词
검사 동등화 ; 비동등집단 동등화 설계 ; 가교검사 ; 검사동등화 ; 가교검사 구성조건 ; 검사 개발 ; non-equivalent groups equating design ; test equating ; anchor test
参与者
이현숙
参与机构
未公开
项目标书摘要:Research Summary:This study investigated effects of relaxing statistical assumptions for constructing anchor test on accuracy of equating.The NEAT design with external anchor was considered because meeting the assumptions for constructing the anchor test as a miniature test of a total test in terms of content and statistical properties would not make a great complication for the equating design with internal anchor test.As a common-item selection rule for simulation,the range of item discrimination parameters for anchor item sets was considered as well as the range of item difficulty parameters,in order to reflect reality more precisely and observe how anchor item selection rules based on both characteristics would interact to result in more accurate or inaccurate equating results.In addition,length of the anchor test was also considered to examine whether consistent patterns would be observed across different lengths of anchor tests.In order to avoid possible complications,it was assumed that tests are composed of multiple-choice items only,the mean of the difficulty parameters of the anchor test is the same as that of the total test,context effects are not of a concern,and sample sizes are large enough to obtain stable parameter estimates.Three equating methods considered in the simulation studies include the frequency estimation equipercentile(FEE)equating,the chained equipercentile(CE)equating,and the IRT observed-score equating.The results of this study showed that,in terms of equating accuracy,it would be desirable if one can select anchor items that have high discrimination and have item difficulty narrowly distributed around the mean,and also showed that item discrimination is more crucial factor affecting performance of equating than item difficulty when constructing an anchor test.Interaction was found between the anchor item selection and the ability differences between equating group in this study as well.Selecting anchor items over the narrower range of b-parameters did not affect equating results noticeably when the two equating groups did not differ in proficiency distributions,whereas it did increase the accuracy of equating in a great deal when the two groups differ in mean and skewness.In addition,for all conditions of non-equivalency of examinee proficiency,differences in accuracy of equating among spread of b-parameters were more apparent when anchor test length is 25%of the total test than 50%.These results imply that constructing anchor test with items spread over the narrower range of b-parameters would have more crucial effects on equating especially when equating conditions are less-than-optimal.Although there were some exceptions,the FEE equating was most sensitive to changing anchor item selection rules and the IRT observed-score equating was most stable across various anchor item selection rules.When the FEE equating was used,it was more frequently found over simulation conditions than the other two equating methods that constructing anchor test using items with the higher discrimination and narrower spread of item difficulty was more accurate than using items selected from the full range of the item discrimination and item difficulty.This observation was more obvious in the middle range of score scale than at the extremes,and the effect seemed to be amplified when there were ability differences between equating groups.This suggests that constructing anchor test from the higher range of a-parameters and narrower range of b-parameters around the mean would be more beneficial to test developers if the testing program employs FEE equating than CE or IRT observed-score equating.
Application Abstract: Research Summary:This study investigated effects of relaxing statistical assumptions for constructing anchor test on accuracy of equating.The NEAT design with external anchor was considered because meeting the assumptions for constructing the anchor test as a miniature test of a total test in terms of content and statistical properties would not make a great complication for the equating design with internal anchor test.As a common-item selection rule for simulation,the range of item discrimination parameters for anchor item sets was considered as well as the range of item difficulty parameters,in order to reflect reality more precisely and observe how anchor item selection rules based on both characteristics would interact to result in more accurate or inaccurate equating results.In addition,length of the anchor test was also considered to examine whether consistent patterns would be observed across different lengths of anchor tests.In order to avoid possible complications,it was assumed that tests are composed of multiple-choice items only,the mean of the difficulty parameters of the anchor test is the same as that of the total test,context effects are not of a concern,and sample sizes are large enough to obtain stable parameter estimates.Three equating methods considered in the simulation studies include the frequency estimation equipercentile(FEE)equating,the chained equipercentile(CE)equating,and the IRT observed-score equating.The results of this study showed that,in terms of equating accuracy,it would be desirable if one can select anchor items that have high discrimination and have item difficulty narrowly distributed around the mean,and also showed that item discrimination is more crucial factor affecting performance of equating than item difficulty when constructing an anchor test.Interaction was found between the anchor item selection and the ability differences between equating group in this study as well.Selecting anchor items over the narrower range of b-parameters did not affect equating results noticeably when the two equating groups did not differ in proficiency distributions,whereas it did increase the accuracy of equating in a great deal when the two groups differ in mean and skewness.In addition,for all conditions of non-equivalency of examinee proficiency,differences in accuracy of equating among spread of b-parameters were more apparent when anchor test length is 25%of the total test than 50%.These results imply that constructing anchor test with items spread over the narrower range of b-parameters would have more crucial effects on equating especially when equating conditions are less-than-optimal.Although there were some exceptions,the FEE equating was most sensitive to changing anchor item selection rules and the IRT observed-score equating was most stable across various anchor item selection rules.When the FEE equating was used,it was more frequently found over simulation conditions than the other two equating methods that constructing anchor test using items with the higher discrimination and narrower spread of item difficulty was more accurate than using items selected from the full range of the item discrimination and item difficulty.This observation was more obvious in the middle range of score scale than at the extremes,and the effect seemed to be amplified when there were ability differences between equating groups.This suggests that constructing anchor test from the higher range of a-parameters and narrower range of b-parameters around the mean would be more beneficial to test developers if the testing program employs FEE equating than CE or IRT observed-score equating.