Introduction
The safe surgical practice relies on surgeons being aware of their own skill sets and capabilities, as well as on the acknowledgment of their limitations. Particularly among in-training physicians, accurate self-assessment of both confidence and competence is important goals to ensure an adequate learning process and safer patient health care1. Nevertheless, the ubiquitous over/under-confidence bias may impose a miscalibration between self-judgments and the objective accuracy of those judgments. Thus, Pallier et al2. have identified overconfidence in three different ways: (1) as an overestimation of one's actual performance, (2) as an over placement of one's performance relative to others, and (3) as an excessive certainty regarding the accuracy of one's beliefs or knowledge, known as overprecision. Studies on physicians found that their self-assessment of clinical skills did not correlate well with an external evaluation of the same competencies, and the most inaccurate self-assessments were observed in the physicians who expressed the highest confidence level or those who were externally-rated to be the lowest3. On the other hand, studies conducted on junior doctors showed more variable results in terms of the correlation between self-perceived and objectively measured or observed competency, with poorer correlations in practical clinical skills4-6. Although weak or no associations between physicians' self-rated and external assessments have been often observed3, evaluating self-perceived competence may provide an indication on the subject's motivation in maintaining and improving the skills concerned, and furthermore, it is considered an important component of self-efficacy7. Several classical assessment tools and methods have been used in studies reporting self-assessment of surgical skills and competencies: operative component rating scale8, global rating scale9, global score10, visual analogue scale11, standardized forms12, blinded or not blinded direct observation13,14, single or multiple observers15, video playback analysis16, objective structured assessment of surgical skills14,17, hierarchical task analysis10, self-assessment score of performance15, competency assessment tool18, bench models11,13, virtual reality simulators13,19, live animal models9, and live operating setting10. All these different approaches have shown that the evidence of self-estimated accuracy of surgical technical skills is still contradictory20,21.
Meanwhile, the learning curve theory has been recently revisited and promoted as a valuable method to assess medical competencies22. Learning curve models are useful to assess an individual physician's progress toward his/her medical capabilities in patient care, by graphically representing the relationship between the learning effort and the resultant learning outcomes. Alternatively, surgical learning curve describes the relationship between deliberate practice and subsequent performance through a classical S-shaped curve that divides a series of ascending categories of expertise. Evidence supporting the validity of a learning curve as a useful tool to assess skill acquisition basically relies on the Dreyfus23 and Ericsson24,25 models which describe expertise development as a progression through several stages, from a novice who is not allowed to practice on patients to a reflective expert who functions at the highest levels.
Based on this theoretical framework, we hypothesized that in-training residents and fellows could over or underestimate their actual surgical skills compared with their performances perceived by an external expert observer. Therefore, the aim of this study was to explore how in-training junior physicians perceive their surgical performance compared with the one externally-rated by their senior surgeon trainers, using a general learning curve model.
Material and methods
Between April and June 2018, a prospective study was conducted at a community hospital associated with the Buenos Aires University School of Medicine. To assess how in-training young physicians estimated their surgical performance, 48 first- to fourth-year surgical residents and fellows were invited to choose one among six exclusive options, which were intended to summarize their own perceived performance or skills as in-training surgeons at the moment of the survey. Residents and fellows were asked to place themselves in one of the following learning curve categories:
- Novice: I have no skill or experience to perform any surgical procedure.
- Advance: I can practice some surgical procedures with full supervision.
- Competent: I can practice some surgical procedures with supervision on call.
- Proficient: I can practice some surgical procedures without supervision.
- Expert: I can supervise others to practice some surgical procedures.
- Automatic expert: I can practice some surgical procedures automatically.
To minimize biased selection, residents had only access to definitions, but not to the names of categories. These learning curve theoretical approaches and definitions were adopted from Pusic et al22. To avoid inconsistent opinions of junior physicians with null surgical experience, 1st-year residents participated in the survey when they had completed at least 10 months within the surgical residency program. After selecting their own perceived surgical performance, five selected senior surgeons (multiple observers design) who supervised the residents and fellows, were asked to give their own opinions about the expertise level reached by each surveyed in-training physician, according to the same learning curve categories. Opinions were considered to be expressed in a double-blind way since neither residents/fellows nor surgeons knew the existence of a cross-evaluation. In this case, statistical analysis was done by comparing the level of concordance between residents' and fellows' own perceived skills and their actual performances as estimated by senior assistant surgeons. From a traditional viewpoint, perceived skills were defined as the self-reported confidence level, and estimated performance as the observed competence17.
Participants were assured confidentiality in responding to the questionnaire. All respondents voluntarily participated in the study after being explained its purpose and expressed consent by filling the form. All personal identifiers were removed or disguised so the physicians described were not identifiable and could not be identified through the details of the study. Heads of the medical training institution provided access to the residents and fellows after ethical approval of the protocol. Ethical clearance for this study was granted by the Institutional Review Board of the Deutsches Hospital of Buenos Aires.
Statistical analysis
Cohen's kappa statistic and weighted kappa with Cicchetti's weighting scheme were used to assess concordance between residents' and fellows perceived skills and their performances as estimated by senior surgeons. The median value of multiple external observers was used for the purpose of analysis. Qualitative interpretation of kappa indexes was based on current recommendations26. 95% confidence intervals (95% CI) for concordance indexes were also calculated. Since the differences between the learning curve ordinal categories are critical for surgical skill development, we preferred to weigh kappa statistic with Cicchetti's proportional weighting instead of exponential quadratic weighting. The sample size for weighted kappa analysis was estimated with n = 2c², where c is the number of categories27. Since the first category (novice) was expected not to be selected by in-training physicians, then the sample size calculated was n = 50 individuals. The degree of agreement (inter-rater reliability, [IRR]) among multiple external observers (senior surgeons) regarding junior doctors' performance level was expressed as percent agreement and intraclass correlation coefficient. Percent agreement was calculated as the number of agreement scores divided by the total number of scores. Overall comparison between junior and senior physician responses was done with Yates' Chi-square test for 4 degrees of freedom, according to the number of selected categories. Continuous variables were expressed as mean and standard deviation (SD). Statistical analysis was performed with EPIDAT, Version 4.1 (Xunta de Galicia-PAHO/WHO), and SPSS Statistics for Windows, Version 17.0 (Chicago: SPSS Inc.) and a two-tailed p ≤ 0.05 was considered statistically significant.
Results
Forty-seven out of 48 first- to fourth-year surgical residents and fellows (98%), and 50 senior surgeons (5 for each specialty) completely responded the survey. The study included the following surgical specialties and number of participants: general (n = 11), colorectal (n = 2), liver (n = 1), plastic (n = 2), cardiovascular (n = 2), neurological (n = 2), urological (n = 5), gynecological and obstetrics (n = 7), orthopedic (n = 10), and ophthalmological (n = 5) surgery. Mean age of residents and fellows was 29.6 years (SD 2.9), and 30 (64%) were male.
Figure 1 shows the response of residents and fellows to self-estimation of their surgical skills and competencies compared with their actual performances as estimated by senior assistant surgeons, based on learning curve categories. Globally, self-assessments tended to overestimate their positions on the learning curve; particularly for "proficient" over "competent," and for "automatic expert" over "expert" categories (p = 0.025). 24 (51%) and 8 (17%) residents and fellows overestimated and underestimated his/her performance, respectively. Overestimation rate was 38% (10/26) for first- to third-year residents versus 64% (14/22) for the rest of respondents (p = 0.148), whereas underestimation was 19% (n = 5) versus 14% (n = 3), respectively (p = 0.897). Average degree of agreement among senior physician responses was 50.0% (95% CI 43.7-56.3%) (intraclass correlation coefficient = 0.737, 95% CI 0.637-0.825). Comparison between residents' and fellows' perceived skills and their performances as estimated by senior surgeons showed a poor to weak concordance according to kappa measures (kappa = 0.174, 95% CI 0.019-0.328, p = 0.007 and weighted kappa = 0.494, 95% CI 0.359-0.631, p < 0.0001). The Bland-Altman plot of the difference between self-estimation and external evaluation of surgical skills is shown in figure 2. Average bias between paired values was 0.40 (95% CI 0.11-0.70), demonstrating lack of concordance between in-training doctors' and senior surgeons' opinions. The positive deviation of the difference between responses revealed a global overestimation of their surgical skills and competencies as seen by junior physicians.
Discussion
A double-blind cross-validation design study with multiple external observers was conducted to assess self-estimation of surgical competencies among in-training junior surgeons, compared with the external evaluation made by their senior surgeon trainers. Concordance analysis demonstrated that residents and fellows of surgical specialties tended, in general, to overestimate their current performances regarding learning curve categories, when compared with external observers' opinions. The perceived level of competence among junior doctors revealed a high proportion of them thinking they are able "to practice without supervision," or being an "automatic expert" on the top stratum of the learning curve when performing some surgical procedures.
In medicine, the development of expertise requires the recognition of one's capabilities and limitations8. Safe clinical practice depends on being able to recognize the limits of one's competence so that the doctor does not only take unnecessary risks but also underconfidence would make physicians unable to act to prevent critical incidents1. Therefore, from a patient's safety perspective, the relationship between confidence and competence is crucial.
It is controversial whether self-assessment is an accurate form of technical skill appraisal in surgical specialties20. In general surgery, four studies reported that candidates' self-assessment and expert independent evaluation correlate poorly, with trainees overestimating their abilities9,11,13,16. For example, surgeons consistently overestimated their performance during a laparoscopic colectomy course as measured by a reliable global rating scale9. Using a 5-point scale from "novice" to "expert," Morgan and Cleave-Hogg28 found that the level of confidence of medical students had no predictive value in performance assessment on anesthesia simulated scenarios. A discrepancy was also observed among urology residents' perceptions of their skills' proficiency, compared with faculty members' evaluations29. An identical lack of concordance was reported when assessing operative skills of pediatric neurosurgery residents30. Conversely, another nine investigations reported good self-assessment accuracy in general surgery8,10,12,14,15,18,19,31,32, and in a pilot study, orthopedic surgery residents could successfully self-assess their performance using a milestones-based method33. Most of these studies included only one to three external observers, with IRR fluctuating between 0.61 and 1. In the current study, a lower IRR was expected since the opinion of five external observers was included for each junior doctor.
People tend to overestimate their ability in many different domains, with this overestimation increasing with harder tasks and decreasing with easier tasks34,35. Some evidence suggests that self-appraisal is more accurate with increased experience16,31, surgical training level and age14. Conversely, there is other evidence regarding an increase of underconfidence with practice; this counterintuitive effect seems to depend on the awareness of self-limitation in task performance36,37. Although we have observed a global underestimation rate of 17%, we did not find that paradoxical effect when comparing postgraduate one- to third-year versus the rest of the participants.
Some authors suggested that self-assessment of a cognitive task may be fundamentally different from an objective technical task. This is based on the notion that the performance of technical labor, unlike a cognitive one, can be judged through immediate or direct feedback provided by the outcome of that labor21,38. Thus, the agreement between self and external assessment for cognitive tasks may be different for technical tasks. Hence, strategies to improve the agreement between self and external assessment in the context of surgical training should include high-quality, timely, coherent, and non-threatening external feedback from expert observers to trainees38.
This study has some limitations. First, the possible ambiguity of some statements of the questionnaire would be offset by the simultaneous application of the same survey to junior and senior physicians. Probably, junior doctors' estimation level would vary if they confronted a real surgical situation than a paper-based survey. Another limitation is that the demonstration of misplaced estimation among residents and fellows does not necessarily mean that consequences or benefits are derived from it, or that these biases are necessarily a problem. Some bias could emerge from the assignment of certain senior surgeons as observers capable of judging the participants, and probably greater standardization should be required for the external assessors. Although it is unlikely that one standard self-assessment tool can be suitably applied to all technical procedures, we used a global approach based on learning curve categories to achieve a general image of the perceived estimation of in-training junior physicians' surgical performance. Although confidence and competence are linearly associated, there is a critical difference in whether trainees have gained a greater belief in their abilities at carrying out a particular skill and whether they are technically more proficient in putting them into practice1.
Conclusions
Comparison of self-reported estimation of residents' and fellows' surgical skills with the observed competence estimated by their senior surgeon trainers showed poor concordance. About half of the residents and fellows included in some surgical specialty training program overestimated his/her actual performance as assessed by classical learning curve categories. Nevertheless, underestimation of self-assessed performance was also observed in almost one-fifth of the respondents. An increased awareness of the existence of over-and underestimation effects can increase the reliability of medical judgment. An improved feedback from expert observers to in-training surgeons could result in a more accurate self-perception of their real surgical skills and competencies.