Analysis of the professional exam at UNAM Faculty of Medicine: An experience in objective assessment of learning with item response theory

Delgado-Maldonado, Laura; Sánchez-Mendiola, Melchor

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Investigación en educación médica

On-line version ISSN 2007-5057

Abstract

DELGADO-MALDONADO, Laura and SANCHEZ-MENDIOLA, Melchor. Analysis of the professional exam at UNAM Faculty of Medicine: An experience in objective assessment of learning with item response theory. Investigación educ. médica [online]. 2012, vol.1, n.3, pp.130-139. ISSN 2007-5057.

Introduction: The end-of-career Professional Exam is a high-stakes summative assessment done at UNAM's Faculty of Medicine in Mexico, to certify that undergraduate medical students have achieved the knowledge level required to enter practice as a general physician. One source of validity evidence is the exam's internal structure, studied with item analysis. Classical Measurement Theory (CMT) has traditionally been used for this purpose, but it has several disadvantages that Item Response Theory (IRT) intends to solve. This report describes the use of the IRT model in the analysis of the written Professional Exam at UNAM's Faculty of Medicine. Objective: To explore the benefits of using the IRT model to obtain validity evidence for a high-stakes achievement test in a medical school. Method: A psychometric analysis of the written Professional Exam at UNAM's Faculty of Medicine was performed in 2008. The test was a written 420-item multiple-choice question exam that covers Internal medicine, Pediatrics, Obstetrics and gynecology, Emergency medicine, Surgery and Family medicine. CMT elements were calculated: reliability, difficulty and discrimination. The three-parameter IRT model was used. With these calculations the best items were selected, and the length of the test was estimated with Spearman-Brown's prophecy formula. Results: The exam was taken by 882 medical students, had mean difficulty index of 0.55 and reliability of 0.93. With the 3pl-IRT model, it was found that the test was particularly informative in ability levels close to the mean in the theta scale. The average discrimination parameter (a) was 0.67, the difficulty parameter (b) was 1.21, and the seudo-guessing parameter (c) was 0.18. A shortened version of the test (250 items) was designed using the information obtained, maintaining a high reliability. A majority of the items in the original test (84%) had a good fit to the 3pl-IRT model, and in the shortened version almost all of them (97%) had an appropriate model fit. Discussion and conclusions: The written Professional Test at UNAM's Faculty of Medicine fulfills the conceptual requirements (item number, examinees' sample size) to apply the IRT model in its item analysis. This information augments the validity evidence of the exam's score inferences and interpretations, and provides a psychometric panorama of the instrument that is useful to plan subsequent versions of the exam. The exam can be reduced in length making it more efficient, without losing precision in the estimation of the subjects' ability level or content validity.

Keywords : Item response theory; classical measurement theory; summative assessment; multiple-choice questions; high-stakes assessment; undergraduate medical education.

· abstract in Spanish · text in Spanish · Spanish (

pdf )