Introduction
Residency is a critical step in the education of a surgeon1. In USA up to 88% of general practitioners (GP) will eventually study a medical specialty, this percentage decrease to 35% in Mexico2. The demand for surgical residencies currently exceeds the number of positions offered in several countries around the world such as Mexico and USA3.
In Mexico, the Interinstitutional Commission for Human Resources Training for Health (Comisión Interinstitucional para la Formación de Recursos Humanos para la Salud) a department of the Undersecretariat of Innovation and Quality of the Mexican Ministry of Health, considers 27 medical specialties with direct entry4. The score that a GP obtains in the National Evaluation for Medical Residency Applicants (Examen Nacional de Aspirantes a Residencias Medicas [ENARM]) is the entrance door to a specialization course endorsed by a Mexican University5,6. The ENARM is a one-step only exam that uses multiple-choice questions and computerized patient cases to assess examinee's knowledge related to foundational science concepts applicable to medical and scientific theories to clinical medicine; details concerning the logistics' of the exam has been published previously7,8.
Recent studies have compared different features of the ENARM: the number of Mexican test-takers and accepted GPs belonging to each Mexican medical school registered in the ENARM5; the logistics and transparency of the ENARM exam7; the performance of private versus public schools using a summary measures method, exploring significant differences in the performance based on geographic regions, and socio-economic level of the Mexican states to which each school belongs5,9; and the assessment of the assumption of equity in the ENARM8.
For the Mexican educational institutions, the ENARM scores and the percentages of the selection of their graduates are indicators of efficiency and reason of prestige and even of propaganda among the aspirants to study medicine10. We have observed that in recent years the highest ENARM scores correspond to those specialties known as Block 111, these are five surgical specialties with direct entry: gynecology and obstetrics, General surgery, otorhinolaryngology, ophthalmology and traumatology, and orthopaedics12. However, to the best of our knowledge, no studies have compared the performance of these five direct-entry surgical specialties at ENARM in the last 8-years; neither they have compared the scores of Mexicans versus international medical graduates (IMG) at each one of these specialties.
Considering the above-mentioned information, we aimed to compare the performance in the ENARM of each of these five direct-entry surgical specialties and also compared Mexican versus IMG in each specialty; we also included a trend analysis along 7 years. We hypothesized that Mexican test-takers achieve higher scores than IMG with significant growth trends in their exam scores.
Materials and methods
Study design and data acquisition
This study was cross-sectional and used historical data that did not require approval by an Institutional Review Board. We based our analyses in the annual public report of the ENARM during 8 years from 2012 to 2019. The Interinstitutional Commission issued the reports for Human Resources Training for Health (CIFRHS, Comisión Interinstitucional para la Formación de Recursos Humanos para la Salud) a department of the Undersecretariat of Innovation and Quality of the Mexican Ministry of Health13. The reports contain quantitative information of the academic performance at each medical specialty from graduate students who took the ENARM; the reports are freely available as PDF files at the CIFRHS website14.
Logistics of ENARM and assessed variables
Five test forms are created each year, each comprising 450 multiple-choice single-best answer items; no item is used in more than one test form. All test forms contain the same number of items per area of knowledge (specialty/subspecialty), with an approximate item distribution of 37.5% internal medicine, 25% pediatrics, 22% gynecology-obstetrics, and 15% surgery. Applicants for each specialty are ranked from highest to the lowest according to their total ENARM score. Ranked applicants receive a "pass" certificate until the quota is met according to that specialty's available positions8.
For each year (2012-2019), we recorded the minimum and maximum scores (calculated by dividing the absolute number of correct answers by the total number of items) clustered by nationality (Mexican or IMG) and chosen specialty (five direct-entry specialties) that coincidentally appear in the annual CIFRHS report.
Statistical analysis and data visualization techniques
Our analysis was performed at two steps, first we compare the minimum and maximum scores among surgical specialties, and second we compared the minimum and maximum scores between Mexican and IMG.
In the first part of our analysis, we compare the minimum (MinSco) and maximum (MaxSco) scores of the five direct-entry surgical specialties evaluated by the ENARM (general surgery, gynecology and obstetrics, ophthalmology, otorhinolaryngology, and traumatology and orthopedics); the Kolmogorov-Smirnoff and Shapiro-Wilk tests showed a non-significant p-value for each specialty, which indicated a normal distribution of data in both variables (MinSco and MaxSco). Then, we performed a one-way ANOVA to reveal the differences in the scores achieved by each specialty; variables were tested for homogeneity of variance and post hoc tests used the least significant difference method. To test the assumption that MinSco and MaxSco increase every year, we assessed if there was a significant linear trend for the scores to increase across the specialties. For this assessment, we use the polynomial option (in the ANOVA menu of SPSS), in its contrast box with chose the Degree:Linear (defaul) option. Detailed descriptions of the ANOVA test in clinical settings have been previously published by our group15,16. Descriptive statistics were used for each variable, 95% confidence intervals (C.I.)17. The effect size assessment (proportion of the variance in the dependent variable that can be explained by the independent variable) of each result was obtained using the Partial Eta Squared (η2)18, where 0.01-0.06 = small effect, 0.06-0.14 = moderate effect, and > 0.14 = large effect. To visualize the results, we use graph lines showing the evolution of MinSco and MaxSco every year for each specialty, we also drew bar graphs with the global means indicating those specialties whose mean were above or below a global mean for all specialties.
For the second part of our analysis, we looked for significant differences between Mexican and IMG in their scores by analyzing independently each specialty, the comparison of means was done using the independent T-test. To analyze the trend of the MinSco and MaxSco every year for each specialty, the Pearson's correlation coefficient helped us to reveal direction trends: positive for increasing scores (↑) with every year (2012-2019) or negative for decreasing scores (↓). We completed the analysis using the forecasting method to calculate a 5-years trend in the MinSco and MaxSco of each specialty and detected if there was a crossing point between Mexican and IMG for each medical specialty. Similar to medical specialties, we used our previously calculated global means for the MinSco and MaxSco to group the Mexican and IMG of the specialties that lied above or below the mean for specialty.
Score comparisons and trend analyses were performed using the IBM® SPSS® Statistics software (version 25.0.0.1 IBM Corporation; Armonk, NY, USA). Data visualization of score trends and forecasting analysis used ©Tableau software (version 2019.1.3, Seattle, Washington, USA). Statistical significance considered p < 0.05 (two-tailed).
Results
Scores included in the analysis
For each score (MinSco and MaxSco) we evaluated 80 measures, 16 for each specialty (eight scores for Mexicans and eight for IMG for the years 2012-2019), with a total 160 measures included in the analyses.
Grouping of specialties above or below a global mean
We calculated a MinSco global mean of 72.572. Specialties above this mean were ophthalmology, otorhinolaryngology, and general surgery. Specialties below the mean corresponded to traumatology and orthopedics and gynecology and obstetrics.
The global mean for the MaxSco was 81.559; only two specialties were above this mark: ophthalmology and general surgery. The other three specialties below the global mean were traumatology and orthopedics, gynecology and obstetrics, and otorhinolaryngology. Figure 1A and B showed the scores above or below the global mean for surgical specialties.
Comparison of minimum and maximum scores achieved by surgical specialties
The one-way ANOVA depicted a significant difference among the minimum scores achieved by the five surgical specialties; F (4, 78) = 24.586, p ≤ 0.001; the η2 = 0.570 indicated a great effect size. Post hoc tests showed that significant differences were observed between each one of the surgical specialties (Bonferroni adjusted p = 0.01). Only two pairs of specialty-comparisons were non-significant: gynecology and obstetrics versus traumatology and orthopedics (p = 0.102), and ophthalmology versus otorhinolaryngology (p = 0.566). There was a significant linear trend for the increasing scores with every year F (1, 7) = 18.558, p ≤ 0.001; the η2 = 0.164 indicated a great effect size.
We found an opposite result in the comparison of the MaxSco between surgical specialties, as the ANOVA test was not significant F (4, 78) = 0.708, p = 0.590 which indicated that there was not difference in the MaxSco between surgical specialties; the η2 = 0.04 indicated a small effect size. The test for a linear trend of the MaxSco with every year did not show significance F (1, 7) = 1.610, p = 0.209; with a small effect size, η2 = 0.020.
Table 1 depicts the means, standard deviation, standard error, and 95% CI for the MinSco and MaxSco scores in each specialty. Figure 1C and D shows mean comparison of surgical specialties showing the trend by year. Figure 1E and F depicts the global trend of the MinSco and MaxSco during 8 years (2012-2019).
Minimum scores | |||||||
---|---|---|---|---|---|---|---|
Mean | Sth. deviation | Sth. error | 95% Confidence interval for mean | Minimum | Maximum | ||
Lower bound | Upper bound | ||||||
General surgery | 72.847 | 1.858 | 0.464 | 71.857 | 73.837 | 70.001 | 75.556 |
Gynecology and obstetrics | 69.569 | 1.997 | 0.499 | 68.506 | 70.633 | 67.333 | 74.444 |
Ophthalmology | 75.052 | 1.949 | 0.503 | 73.972 | 76.131 | 72.223 | 78.000 |
Otorhinolaryngology | 74.653 | 1.978 | 0.495 | 73.599 | 75.707 | 71.334 | 78.000 |
Traumatology and orthopedics | 70.694 | 1.823 | 0.456 | 69.723 | 71.666 | 67.778 | 74.667 |
Total | 72.532 | 2.857 | 0.321 | 71.892 | 73.172 | 67.333 | 78.000 |
Maximum scores | |||||||
Mean | Std. deviation | Std. error | 95% Confidence interval for mean | Minimum | Maximum | ||
Lower bound | Upper bound | ||||||
General Surgery | 83.306 | 4.168 | 1.042 | 81.085 | 85.526 | 76.889 | 91.111 |
Gynecology and obstetrics | 80.861 | 4.555 | 1.139 | 78.434 | 83.288 | 70.667 | 86.444 |
Ophthalmology | 81.970 | 3.682 | 0.951 | 79.931 | 84.009 | 75.333 | 86.889 |
Otorhinolaryngology | 80.750 | 4.394 | 1.099 | 78.408 | 83.092 | 73.778 | 87.778 |
Traumatology and orthopedics | 81.125 | 4.776 | 1.194 | 78.580 | 83.670 | 74.443 | 90.000 |
Total | 81.598 | 4.335 | 0.488 | 80.627 | 82.569 | 70.667 | 91.111 |
CI: confidence intervals.
Comparison of minimum and maximum scores between Mexicans and IMG in each surgical specialty General Surgery
For this specialty, we did not find a significant difference in the MinSco, but significance was found in the MaxSco between Mexicans and IMG. A similar finding was revealed for gynecology and obstetrics, ophthalmology, otorhinolaryngology, and traumatology and orthopedics. Table 2 depicts the means, SD, standard error of mean between Mexicans and IMG for each surgical specialties; p-values were calculated with the independent t-test.
Minimum scores | |||||||
---|---|---|---|---|---|---|---|
Mexican | IMG | p-value | |||||
Mean | Sth. deviation | Sth. error mean | Mean | Sth. deviation | Sth. error mean | ||
General surgery | 72.778 | 1.984 | 0.701 | 72.917 | 1.857 | 0.657 | 0.887 |
Gynecology and obstetrics | 69.333 | 1.805 | 0.638 | 69.806 | 2.271 | 0.803 | 0.652 |
Ophthalmology | 74.472 | 1.707 | 0.603 | 75.714 | 2.123 | 0.802 | 0.231 |
Otorhinolaryngology | 74.223 | 2.088 | 0.738 | 75.084 | 1.898 | 0.671 | 0.403 |
Traumatology and orthopedics | 70.528 | 1.805 | 0.638 | 70.861 | 1.950 | 0.689 | 0.728 |
Maximum scores | |||||||
Mexican | IMG | p-value | |||||
Mean | Std. deviation | Std. error mean | Mean | Std. deviation | Std. error mean | ||
General surgery | 86.723 | 2.360 | 0.834 | 79.889 | 2.228 | 0.788 | < 0.001 |
Gynecology and obstetrics | 84.555 | 1.425 | 0.504 | 77.167 | 3.353 | 1.185 | < 0.001 |
Ophthalmology | 85.000 | 1.585 | 0.560 | 78.508 | 1.575 | 0.595 | < 0.001 |
Otorhinolaryngology | 83.694 | 1.328 | 0.469 | 77.806 | 4.450 | 1.573 | 0.003 |
Traumatology and orthopedics | 85.278 | 2.228 | 0.788 | 76.972 | 2.118 | 0.749 | < 0.001 |
Positive and negative trends in the minimum and maximum scores between Mexicans and IMG in each surgical specialty
For the minimum score in general surgery, both groups showed a positive and significant correlation, Mexican R = 0.803, p = 0.016, and IMG R = 0.785, p = 0.021. Gynecology and obstetrics, both groups showed a positive but no significant correlation; Mexicans R = 0.632, p = 0.093. IMG, and R = 0.562, p = 0.147. Ophthalmology, Mexicans showed a positive and non-significant correlation, R = 0.596, p = 0.119; on the other hand, IMG showed a positive and significant correlation R = 0.767, p = 0.044. Otorhinolaryngology for both specialties showed a positive but non-significant correlation between MinSco and years; Mexicans R = 0.658, p = 0.076. IMG, and R = 0.529, p = 0.178. In the last category, traumatology and orthopedics, both groups showed a positive and significant correlation; Mexicans R = 0.851, p = 0.007. IMG, and R = 0.828, p = 0.011.
For the maximum score, in general surgery, Mexicans showed a positive non-significant correlation, R = 0.362, p = 0.378. IMG showed a positive and significant correlation R = 0.866, p = 0.005. In the gynecology and obstetrics, both groups showed a negative and no significant correlation; Mexicans R = -0.300, p = 0.470. IMG, and R = -0.414, p = 0.308. Ophthalmology, both groups showed a positive and no significant correlation; Mexicans R = 0.000074, p = 1.00; and IMG R = 0.327, p = 0.474. Otorhinolaryngology for both specialties showed a positive but non-significant correlation between MaxSco and years; Mexicans R = 0.171, p = 0.686. IMG, R = 0.459, p = 0.253. For the last category, traumatology and orthopedics, both groups showed a positive but non-significant correlation; Mexicans R = 0.681, p = 0.063. IMG, and R = 0.474, p = 0.235.
Table 3 shows the trends of minimum and maximum scores grouped as Mexican and IMG and the statistical significance. Figure 2 shows the graphical representation of the observed means for both SMinS and SMaxS scores.
Score | Test-taker | Medical specialty | |||
---|---|---|---|---|---|
Significant | Trend | Non-significant | Trend | ||
Minimum | Mexican | General Surgery | ↑ | Gynecology and Obstetrics | ↑ |
Traumatology and Orthopedics | ↑ | Ophthalmology | ↑ | ||
Otorhinolaryngology | ↑ | ||||
International Medical Graduates | General Surgery | ↑ | Gynecology and Obstetrics | ↑ | |
Ophthalmology | ↑ | Otorhinolaryngology | |||
Traumatology and Orthopedics | ↑ | ||||
Maximum | Mexican | General Surgery | ↑ | ||
Gynecology and Obstetrics | ↓ | ||||
Ophthalmology | ↑ | ||||
Otorhinolaryngology | ↑ | ||||
Traumatology and Orthopedics | ↑ | ||||
International Medical Graduates | General Surgery | ↑ | Gynecology and Obstetrics | ↓ | |
Ophthalmology | ↑ | ||||
Otorhinolaryngology | ↑ | ||||
Traumatology and Orthopedics | ↑ |
↑ positive growing trend; ↓ negative growing trend.
Comparison of 5-year forecasting trends between minimum and maximum scores of Mexicans and IMG
We identified convergent and divergent forecasting trends between the minimum and maximum scores depending if the lines will eventually touch each other during or after the 5-year forecasted period (2020-2024 years).
Four specialties showed a convergent pattern for Mexicans between the MinSco and MaxSco: general surgery, gynecology and obstetrics, ophthalmology, and otorhinolaryngology. Only traumatology and orthopedics showed a very mild divergent trend. In IMG, three specialties depicted a convergent trend: gynecology and obstetrics, ophthalmology, and traumatology and orthopedics; and the other two, otorhinolaryngology and surgery, showed a divergent trend. Figure 3 shows the forecasting trends between minimum and maximum scores of Mexicans and IMG.
Ranking of specialties between Mexicans and IMG
In addition, we ranked the specialties based in the mean of the SMinS between Mexican and IMG for each specialty.
Adjacent rows with connecting arrows show the displacement in the ranking from the rank each specialty reached for Mexican to the position they had for IMG. It was evident that the ranking of medical specialties was similar between both groups, with the exception of ophthalmology which move from 5th place for Mexicans to the 1st position for the IMG. Figure 4 shows the ranking displacement in Mexican specialties when we compared with the scores of IMG.
Discussion
Residency is a critical step in the education of a physician, the matching into a residency program is a competitive process of selection by both applicants and program directors19. We believe the graphs and tables presents in this study will be helpful for test-takers of the ENARM, medical students in early years of the career to start planning his desired specialty, medical school advisors, and education department directors in teaching hospitals. These four groups of actor look for strategies to increase the applicants' potential to successfully match.
The main strength of our reports lies in the comprehensive statistical analysis that we performed. The scores included a total of 160 measures, 16 for each one of the five surgical specialties, and eight scores for each test taker group (Mexicans vs. IMG) during the years 2012-2019. We not only compared means with the calculation of a global mean among the five specialties but also we considered trends, 5-years forecasting, and ranking displacement between Mexicans and IMG.
Publications about the ENARM have triggered a great interest in the medical community in the last years; some authors have published descriptive reports about the scores of schools and faculties of medicine, but without a deep statistical analysis5.
Other authors have revealed flaws in the design of the ENARM that produce inequity, but without a mention of scores in medical specialties8,20. Our group published a letter to the editor about the performance of IMG in the ENARM, but without a comparison with Mexicans4. Then, to the best of our knowledge, there are no publications about the ENARM that had presented a comparison of scores between specialties of Group I, surgical specialties.
Grouping of specialties above or below a global mean
The use of an overall mean to compare above or below this mark is helpful to reflect the performance of five-different groups of test-takers that revealed us which specialties had the students with the best scores. The ENARM global mean for the minimum score (from 2012 to 2019) was 72.572 a score above the previous observation made in a study by de la Garza-Aguilar6; this number is also above the mean for the past 7 years for the test known as MIR (Medical Intern Resident) in Spain with 57.29 reported by the Ministry of Health21,22. Our findings showed that the surgical specialties whose applicants achieve scores above this mean were ophthalmology, otorhinolaryngology, and general surgery. Our findings coincide with the study of Rinard et al.,19 where otolaryngology was one of the best-ranked specialties among surgical specialties in Texas, USA. The specialties below the mean corresponded to traumatology and orthopedics and gynecology and obstetrics, this observation of low scores at the ENARM contrast with results of the matching program in USA, where this specialties achieved higher accepted than general surgery19.
Comparison of minimum and maximum scores achieved by surgical specialties
Along the 8 years assessed, it was evident that the ranking of the five surgical specialties was preserved for the MinSco (Fig. 1C), in descendent order otorhinolaryngology, ophthalmology, General surgery, traumatology and orthopedics, and gynecology and obstetrics. On the contrary, for the MaxSco, an entanglement of scores was evident along the 8 years, representing the change of ranking for the surgical specialties at different years (Fig. 1D). For this visualization of data, we also did not find no publications where the performance among medical specialties were compare6.
Comparison of minimum and maximum scores between Mexicans and IMG in each surgical specialty
Our findings reveled that Mexicans and IMG got similar passing grades, which might indicate an equivalent level of education in their medical schools; this finding differs from a previous report from USA that observed in 8 years for the orthopedic surgery residency applicants that national got better scores than IMG3. The absence of significant differences in the minimum scores in all specialties comparing Mexican and IMG can also be interpreted as a high competitiveness across all specialties. However, the MaxSco clearly revealed the superiority of Mexicans above IMG for all specialties; which reflect a better level of preparation for this exam. This score revealed a large gap in knowledge between Mexicans and IMG test-takers1.
Positive and negative trends in the minimum and maximum scores between Mexicans and IMG in each surgical specialty
The limited information about trends for applicants matching into USA specialties has been previously addressed; most of the foreign articles describe the performance of specific specialties, without a comparison between their nationals and IMG23. The use of the minimum score in our study revealed a fierce competition among medical specialties as four of them showed a positive and significant trend, with the exception of gynecology and obstetrics. This trend is similar to a USA report for the surgical specialties (surgery, plastic surgery, orthopedic surgery, otolaryngology, and obstetrics and gynecology) since each specialty has a different mean score for individuals that are accepted in the match19. We learned from our findings that there is still missing information and we do not know which scores at specialties are ruled by the applicants every year and which others by the level of difficulty of the exam; an additional analysis will be necessary to understand how the number of residency positions influences the scores at each medical specialties.
Comparison of 5-year forecasting trends between minimum and maximum scores of Mexicans and IMG
The predictive images help us to understand that for Mexicans the gap between MinSco and MaxSco will decrease for general surgery, gynecology and obstetrics, and ophthalmology, however, for IMG gynecology and obstetrics, traumatology and orthopedics, and ophthalmology. It means there are only two out of five surgical specialties (gynecology and obstetrics, and ophthalmology) between Mexicans and IMG share the same learning trend.
Ranking of specialties between Mexicans and IMG
From this analysis we learned that both, Mexicans and IMG depict the same ranking in the order of selected specialties; although Mexicans observed mildly lower MinSco (Fig. 4). For the MaxSco, the 1st specialty with the highest scores is general surgery, this fact represent a challenge for future applicants, as they would have to get the best scores to be selected for a residency position. In general, Mexicans achieved the highest scores. It was interesting to observe that traumatology and orthopedics was the 2nd place for Mexicans but the 5th for IMG.
Limitations of the study
Several limitations need to be acknowledged for this study. With the ENARM, the Mexican Secretariat of Health get to select the best candidates each year with reasonable confidence, but a number much higher than the accepted is left without entering a medical specialty; we did not analyze those numbers as this topic was out of the scope of our study. Furthermore, we did not comment the context regarding the offer and demand of Mexican physicians per number of inhabitants; in 2015, Mexico had 2.2 physicians per 1000 population, including professionals in the private sector, these numbers represent a significant disparity in the distribution of human health resources in the country. The same year, the USA reported a ratio of 3.1 physicians per 1000 inhabitants24. Although the number and needs of the medical specialists are not found fully identified, the number of existing doctors and possible training at the current rate will be insufficient for the needs of the country5. We did not get deep in the analysis of which medical schools correspond the test-takers with the highest scores, as this information was not available in the annual CIFRHS reports. Our assessment did not perform subgroup performance differences considering age, gender, the race of test takers, English as a second language because all these items were not publicly available. The same limitations had been addressed in previous reports from USMLE; residency program directors look in the ENARM results for the best candidates for their programs, considering all aspects of a student's application and an interview; however, we did not took into account intangible factors such as away rotations, personal interactions, membership, and research experience, although all of them might influence the chance of matching19, they were not assessed in the context of this paper. Other topics no included in this study were the need needed to examine whether there is an ideal applicant-to-position ratio that would allow surgical residency coordinators to remain selective in their choices or whether increasing the number of surgical residency positions would dilute the quality of successful candidates.
Conclusions
Our study provides objective and valuable information for residency program directors looking for the best candidates for their programs and also to applicants, revealing that ENARM represents a market of high-performance test-takers across the surgical specialties. Mexicans and IMG achieved similar entrance scores, but Mexicans showed a higher MaxSco over IMG in all surgical specialties. The comparisons using scores allows program directors to understand which specialties have become more competitive relative to others or their evolution in previous years. Future studies are needed to explore if ENARM scores can be predictive of performance on subsequent assessments of specialty in-training and certification examinations.