가교검사 구성의 통계적 가정 완화에 따른 동등화의 정확성 비교-검사의 편포성과 검사 동등화 방법을 중심으로

项目来源

韩国国家研究基金(NRF)

项目主持人

이현숙

项目受资助机构

건국대학교

立项年度

2011

立项时间

未公开

项目编号

2011-332-B00376

项目级别

国家级

研究期限

未知 / 未知

受资助金额

未知

学科

사회과학

学科代码

未公开

基金类别

人文社科-SD-Research

关键词

검사 동등화 ; 비동등집단 동등화 설계 ; 가교검사 ; 검사동등화 ; 가교검사 구성조건 ; 검사 개발 ; non-equivalent groups equating design ; test equating ; anchor test

参与者

이현숙

参与机构

未公开

项目标书摘要:Research Summary:This study investigated effects of relaxing statistical assumptions for constructing anchor test on accuracy of equating.The NEAT design with external anchor was considered because meeting the assumptions for constructing the anchor test as a miniature test of a total test in terms of content and statistical properties would not make a great complication for the equating design with internal anchor test.As a common-item selection rule for simulation,the range of item discrimination parameters for anchor item sets was considered as well as the range of item difficulty parameters,in order to reflect reality more precisely and observe how anchor item selection rules based on both characteristics would interact to result in more accurate or inaccurate equating results.In addition,length of the anchor test was also considered to examine whether consistent patterns would be observed across different lengths of anchor tests.In order to avoid possible complications,it was assumed that tests are composed of multiple-choice items only,the mean of the difficulty parameters of the anchor test is the same as that of the total test,context effects are not of a concern,and sample sizes are large enough to obtain stable parameter estimates.Three equating methods considered in the simulation studies include the frequency estimation equipercentile(FEE)equating,the chained equipercentile(CE)equating,and the IRT observed-score equating.The results of this study showed that,in terms of equating accuracy,it would be desirable if one can select anchor items that have high discrimination and have item difficulty narrowly distributed around the mean,and also showed that item discrimination is more crucial factor affecting performance of equating than item difficulty when constructing an anchor test.Interaction was found between the anchor item selection and the ability differences between equating group in this study as well.Selecting anchor items over the narrower range of b-parameters did not affect equating results noticeably when the two equating groups did not differ in proficiency distributions,whereas it did increase the accuracy of equating in a great deal when the two groups differ in mean and skewness.In addition,for all conditions of non-equivalency of examinee proficiency,differences in accuracy of equating among spread of b-parameters were more apparent when anchor test length is 25%of the total test than 50%.These results imply that constructing anchor test with items spread over the narrower range of b-parameters would have more crucial effects on equating especially when equating conditions are less-than-optimal.Although there were some exceptions,the FEE equating was most sensitive to changing anchor item selection rules and the IRT observed-score equating was most stable across various anchor item selection rules.When the FEE equating was used,it was more frequently found over simulation conditions than the other two equating methods that constructing anchor test using items with the higher discrimination and narrower spread of item difficulty was more accurate than using items selected from the full range of the item discrimination and item difficulty.This observation was more obvious in the middle range of score scale than at the extremes,and the effect seemed to be amplified when there were ability differences between equating groups.This suggests that constructing anchor test from the higher range of a-parameters and narrower range of b-parameters around the mean would be more beneficial to test developers if the testing program employs FEE equating than CE or IRT observed-score equating.

Application Abstract: Research Summary:This study investigated effects of relaxing statistical assumptions for constructing anchor test on accuracy of equating.The NEAT design with external anchor was considered because meeting the assumptions for constructing the anchor test as a miniature test of a total test in terms of content and statistical properties would not make a great complication for the equating design with internal anchor test.As a common-item selection rule for simulation,the range of item discrimination parameters for anchor item sets was considered as well as the range of item difficulty parameters,in order to reflect reality more precisely and observe how anchor item selection rules based on both characteristics would interact to result in more accurate or inaccurate equating results.In addition,length of the anchor test was also considered to examine whether consistent patterns would be observed across different lengths of anchor tests.In order to avoid possible complications,it was assumed that tests are composed of multiple-choice items only,the mean of the difficulty parameters of the anchor test is the same as that of the total test,context effects are not of a concern,and sample sizes are large enough to obtain stable parameter estimates.Three equating methods considered in the simulation studies include the frequency estimation equipercentile(FEE)equating,the chained equipercentile(CE)equating,and the IRT observed-score equating.The results of this study showed that,in terms of equating accuracy,it would be desirable if one can select anchor items that have high discrimination and have item difficulty narrowly distributed around the mean,and also showed that item discrimination is more crucial factor affecting performance of equating than item difficulty when constructing an anchor test.Interaction was found between the anchor item selection and the ability differences between equating group in this study as well.Selecting anchor items over the narrower range of b-parameters did not affect equating results noticeably when the two equating groups did not differ in proficiency distributions,whereas it did increase the accuracy of equating in a great deal when the two groups differ in mean and skewness.In addition,for all conditions of non-equivalency of examinee proficiency,differences in accuracy of equating among spread of b-parameters were more apparent when anchor test length is 25%of the total test than 50%.These results imply that constructing anchor test with items spread over the narrower range of b-parameters would have more crucial effects on equating especially when equating conditions are less-than-optimal.Although there were some exceptions,the FEE equating was most sensitive to changing anchor item selection rules and the IRT observed-score equating was most stable across various anchor item selection rules.When the FEE equating was used,it was more frequently found over simulation conditions than the other two equating methods that constructing anchor test using items with the higher discrimination and narrower spread of item difficulty was more accurate than using items selected from the full range of the item discrimination and item difficulty.This observation was more obvious in the middle range of score scale than at the extremes,and the effect seemed to be amplified when there were ability differences between equating groups.This suggests that constructing anchor test from the higher range of a-parameters and narrower range of b-parameters around the mean would be more beneficial to test developers if the testing program employs FEE equating than CE or IRT observed-score equating.

  • 排序方式:
  • 0
  • /
  • 排序方式:
  • 0
  • /