The Suitability of Global Rating Scales to Monitor Arthroscopic Training Progress
JJ Stunt1*, GMMJ Kerkhoffs1, B van Ooij1, IN Sierevelt1, MU Schafroth1, CN van Dijk1, J Dragoo3 and GJM Tuijthof1,2
1Department of Orthopedic Surgery, Academic Medical Centre, The Netherlands
2Department of Biomechanical Engineering, Delft University of Technology, The Netherlands
3Department of Orthopedic Surgery and Sports Medicine, Stanford University School of Medicine, USA
*Corresponding author: Jonah Stunt, Department of Orthopedic Surgery, Academic Medical Centre, Orthotrauma Research Center Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel: 0205662173/9117, E-mail: firstname.lastname@example.org
Int J Sports Exerc Med, IJSEM-2-041, (Volume 2, Issue 2), Original Article; ISSN: 2469-5718
Received: August 16, 2015 | Accepted: May 14, 2016 | Published: May 17, 2016
Citation: Stunt JJ, Kerkhoffs GMMJ, Ooij BV, Sierevelt IN, Schafroth MU, et al. (2016) The Suitability of Global Rating Scales to Monitor Arthroscopic Training Progress. Int J Sports Exerc Med 2:041. 10.23937/2469-5718/1510041
Copyright: © 2016 Stunt JJ, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Purpose: As developing arthroscopic skills is challenging and training time for residents is limited, arthroscopic skill competency of residents should be measured. Assessment tools, such as Global Rating Scales (GRS), have been developed for structured, objective feedback and to assess learning curves. The goal of this study is to assess known-groups and convergent validity of these scales, to evaluate the suitability of these scales to monitor training progress of residents.
Methods: Knee arthroscopies and ACL reconstructions performed by residents were supervised and assessed, using both GRS questionnaires. The estimates of the parameters were used to study the relationship between year of residency and each GRS score, and between the number of previously performed arthroscopies and each GRS score. Pearson correlation coefficient between GRS scores were calculated to measure convergent validity. A Bland-Altman plot with a paired t-test was constructed to evaluate the agreement between GRS I and II.
Results: Mixed model analysis revealed a significant increase (p < 0.001) per year of residency on both GRSs (8.1 points (95% CI: 6.3-9.9) and 9.2 points (95% CI: 7.4-11.2) respectively). Significant increases per performed arthroscopy were also observed for both GRSs (p < 0.001) (0.14 (95% CI: 0.09-0.18) and 0.13 points (95% CI: 0.08-0.2) respectively). Scores for ACL reconstructions were significantly lower (p < 0.001) than for standard knee arthroscopies (12.5 and 13.0 points respectively, p < 0.001). The Pearson correlation coefficient between GRS I and GRS II scores was high (0.94). The Limit of Agreement was 11 points.
Conclusion: GRS I and GRS II demonstrate sufficient construct validity. However, they seem not sufficiently sensitive and consistent to establish individual learning curves. Both scales are suitable to objectively evaluate global progress of residents in the operating room when acquiring arthroscopic skills, in particular on group level.
Global rating scales, Objective assessment, Learning curves, Arthroscopic training
Since arthroscopic surgery has several advantages compared with open surgery, it has become the most performed procedure in orthopaedic surgery [1-6]. However, as developing arthroscopic skills is challenging [7-10] and training time for residents is limited [9,11-13], professional societies have requested arthroscopic skill competency of residents to be assessed to improve patient safety [4,9]. Assessment of skills by expert surgeons is sensitive to the subjective opinion of the assessor, which might compromise fair judgment . To overcome this issue, the formulation of criteria and proficiency levels for evaluation of arthroscopic skills is recommended.
Assessment tools for monitoringtechnical skillsin the operating theatre, such as Global Rating Scales (GRS), have been developed for structured, objective feedback and to assess learning curves . Previous research investigated whether GRS are valid, reliable tools to objectify resident performance in surgery; feasibility, face validity, content validity, construct validity and reliability have been demonstrated for various Global Rating Scales [4,15-23]. Two Global Rating Scales that have been specifically proposed for feedback during arthroscopic training, are the Basic Arthroscopic Knee Skill Scoring System  and the Orthopaedic Competence Assessment Project . Insel and co- workers developed a GRS to assess diagnostic knee arthroscopies and partial meniscectomies on cadaver knees (GRS I) . This GRS has demonstrated validity, but only on cadavers, performing basic arthroscopic tasks . Howells and co-workers combined the Orthopaedic Competence Assessment Project and the Objective Structured Assessment of Technical Skill (OSATS) (GRS II) to test arthroscopic simulator training on a bench-top knee simulator . Validity of this GRS has however not yet been assessed.
Since both GRSs are not validated in a clinical context, the goal of this study is to assess validity of these scales during training of arthroscopic skills in the operating room on real-life patients. In the absence of a gold standard, both known-groups and convergent validity are investigated to assess the suitability of these scales to monitor training progress of residents.
Materials and Methods
Twenty-eight orthopedic residents in four consecutive residency years (year 3 to 6) and ten experienced orthopedic surgeons were recruited at two institutions (the Stanford University School of Medicine and the Academic Medical Centre in Amsterdam) (Table 1).
Table 1: An overview of the number of residents per residency year, and the total number of performed procedures by the residents within one residency year. View Table 1
After participants signed informed consent, all outpatient KA's and ACL reconstructions performed by each resident were supervised and assessed by one of the experienced surgeons, using both GRS questionnaires. Before each procedure, the resident's and supervisor's unique identifier code, type of operation, year of residency and number of previously performed arthroscopies were documented. Hundred-and-thirty-five KA's and 30 ACL reconstructions were included (Table 1). Residents performed on average six procedures, within a time frame of maximum a month.
GRS I is a ten-item Global Rating Scale, derived from previously published and validated evaluation models to assess arthroscopic skills (Figure1). Items from these models were used to create a task-speciﬁc checklist and a global rating scale that together form the Basic Arthroscopic Knee Skill Scoring System (BAKSSS), a model specific to diagnostic knee arthroscopy and partial meniscectomy . The checklist was not included in the current study. Items of the global rating scales include dissection, instrument handling, depth perception, bimanual dexterity, flow of operation, knowledge of instruments, to the knowledge of the specific procedure, autonomy, efficiency, and quality of the operative result .
For GRS II, nine of the fourteen Orthopaedic Competence Assessment Project (OCAP) criteria for diagnostic arthroscopy were selected (Figure2). Competences include following procedure protocol, handling of tissue, appropriate and safe use of instruments, appropriate pace with economy movement, calmness and effectiveness in dealing with untoward events, appropriate use of assistants, communication with scrub nurse, and identification of common abnormalities and protection of articular surface .
GRS I and II have similar domains, such as instrument handling, ﬂow of operation, efﬁciency and autonomy. Both Global Rating Scales allow assessors to rate arthroscopic skillsperformance on each domain, using 5-point Likert scales with anchors at 1, 3, and 5 points. The anchor points have specific descriptions of the necessary requirements to receive the respective point values, which should help uniform assessment. Higher scores indicate better arthroscopic proficiency. Minimum and maximum GRS score for the GRS I are 10 and 50 points, respectively . Minimum and maximum GRS score for GRS II are 9 and 45 points, respectively .
Firstly, known-groups validity will be investigated by determining the extent to which the GRSs can discriminate between levels of experience . Secondly, convergent validity is investigated by determining whether the two GRSs correspond with one another, as they cover similar domains of arthroscopic skills . To this end, knee arthroscopies (KA) and anterior cruciate ligament (ACL) reconstructions performed on real life patients will be used. If validity is shown, these GRS could be further developed into objective assessment tools to show individual training progress of residents.
Analysis was performed with SPSS 22© (SPSS Inc., Chicago, IL, USA). All 165 procedures were included in the analysis (Table 1). In order to compare GRS I and II, scores were normalized to a range from 0 to 100 points. Normality of the parameters was assessed using the Kolmogorov-Smirnov test, and the skewness and kurtosis of the sample (-2 < z-value < 2).
To account for correlated assessments within the residents (that is multiple assessments were performed per resident), a multilevel analysis was performed by use of mixed model analysis using a residual maximum likelihood (REML) approach (Table 2).The estimate of the parameters, as well as the standard error and confidence intervals were used to study the relationship between year of residency and each GRS score, and between the number of previously performed arthroscopies and each GRS score. Type of operation (KA of ACL reconstruction), year of residency and number of previously performed arthroscopies were entered as model factors, with GRS I and GRS II scores as dependent variables. Pearson correlation coefficient between the scores of GRS I and GRS II were calculated to measure convergent validity. Correlation coefficients ≤ 0.35 were considered to represent weak correlations, 0.36 to 0.67 moderate correlations, and 0.68 to 1.0 high correlations, with coefficients ≥ 0.91 very high correlations [27,28].
Table 2: Mixed model analysis showing the effects of year of residency, number of previously performed procedures and type of operation on scores for GRS I and II. All estimates are significant (p < 0.001). The intercept can be interpreted as the mean of the outcome when all independent variables are zero. The estimates can be interpreted the same way as the estimates (coefficients) of predictors in a linear regression. GRS scores increase with 8.1 and 9.3 respectively per year of residency, and with 0.14 and 0.13 points respectively per number of previously performed arthroscopies. Scores for ACL-procedures are on average 12.5 and 13.0 points lower that KA scores. View Table 2
A Bland-Altman plot was constructed to evaluate the agreement between GRS I and II. The mean differences between GRS I and II scores against the absolute differences and limits of agreement (LoA) were calculated (1.96*SDdif)) . A paired t-test was performed to assess a systematic difference between the two scales. P-values ≤ 0.05 were considered statistically significant.
Non-normalized GRS sum scores varied between 19 and 50 points for GRS I and between 18 and 45 points for GRS II. In table 2, results of the mixed model analysis are described. The parameters can be interpreted as the constant (intercept) and the coefficients or slopes (estimates) of the independent variables. Mixed model analysis revealed a statistically significant increase (p < 0.001) per year of residency on both GRS I and II, with values of 8.1 points (95% CI: 6.3-9.9) and 9.2 points (95% CI: 7.4-11.2) respectively. Significant increases per performed arthroscopy were also observed for both GRSs (p < 0.001), with values of 0.14 (95% CI: 0.09-0.18) and 0.13 points (95% CI: 0.08-0.2) for GRS I and II respectively. Furthermore, scores for ACL reconstructions were significantly lower (p < 0.001) than for standard knee arthroscopies (12.5 and 13.0 points for GRS I and II respectively, p < 0.001) (Table 2).
Normalized GRS sum scores varied between 40 and 100 points for GRS I and between 38 and 100 points for GRS II. The scores did not differ significantly (p = 0.19), with mean normalized scores of 70.8 (SD is 14.9) for GRS I and 71.3 (SD is 16.2). The Pearson correlation coefficient between the normalized GRS I and GRS II scores was high (0.94). The calculated LoA was 11 points, resulting in a lower limit of -11.6 and an upper limit of 10.4 (Figure 3).
Figure 3: Bland-Altman plot comparing the scores of GRS I and GRS II. The solid line represents the mean difference between the two (-0.57), and the dotted lines represent the upper (10.4) and lower (-11.5) limits of agreement (mean ± 1.96 SD) View Figure 3
This study investigated if the proposed GRSs show construct validity, more specifically, known-groups validity and convergent validity. With the available sample size, the study demonstrated that both GRS I and II were able to discriminate based on year of residency or number of arthroscopies, supporting known-groups validity. Convergent validity of the studied Global Rating Scales was supported by a high Pearson correlation coefficient. The Bland and Altman plot demonstrated that the average discrepancy between GRS I and II was small (close to zero), indicating that there was no systematic difference between the two scales. However, the limits of agreement had higher values then the estimated differences per residency year or arthroscopic intervention: the standard error was 11, meaning that only differences larger than 11 points between the GRSs can be interpreted as actual difference when used on individual level. This emphasizes the question whether the GRSs are reliable outcome measures and if they are suitable for performance monitoring on individual level.
Besides year of residency, GRS scores are influenced by other factors, which can account for variability. One of these factors is the complexity of the type of procedure: ACL reconstructions are more complex than knee arthroscopies, which was reflected in the significantly lower GRS scores forscores for ACL reconstructions. Other factors influencing GRS scores are the complexity of the joint (depending on the anatomy or the severity of the condition of the patient), inter-observer differences between supervisors and the moment of the day or of the week at which the procedure was performed. Thus, as GRS score is determined by other factors additional to level of experience, in particular type and complexity of procedure, a standardized setting is required when using the GRS to measure competence.
The GRSs did not show floor or ceiling effects. None of the participants scored lower than 19 (GRS I) and 18 (GRS II) points, whereas the minimum values of the scales are 10 and 9, respectively. Moreover, no item was scored below two points. This can be attributed to the range of residency years that was included: residents were selected from their third year of residency, as they start than with their specialisation in orthopaedic surgery in the Netherlands. Hence, none of the residents participating in the current study was a completely untrained novice, as opposed to the original studies that also included participants with none or very little training [7,24]. Our results indicate that the GRSs can be used for the entire duration of the residency curriculum.
The current study has limitations. Firstly, residents were assessed by one supervisor, implicating that inter-observer reliability could not be assessed. Secondly, supervisors were not blinded and thus were aware of the level of training of the residents. Thirdly, supervisors were not specifically trained to use the two Global Rating Scales of study. Ideally, all supervisors should have been trained with distinct examples, and multiple observers, who were blinded from the identity of the residents, should have performed the scoring. This would have increased consensus and objectivity. Unfortunately, this was logistically difficult to arrange. Vogt showed that knowing the identity of the resident does not significantly affect scoring . Moreover, the study design was similar to other studies showing the potential of Global Rating Scales to objectively evaluate arthroscopic skills [7,24,31]. Therefore, we expect that these limitations will have marginally influenced the results.
As competency-based education is becoming more important in arthroscopic training , objective tools for assessment and performance monitoring of orthopedic residents need to be validated. The current study showed known-groups validity and convergent validity for Global Rating Scales. However, the results also suggest the scales do not seem to be sufficiently sensitive and consistent to monitor individual learning curves and progress of a trainee over a short period of time. Rather, they are suitable to objectify and assess general arthroscopic performance on group level in a structured way. Moreover, they can be applied in a research setting; to study differences on group level and to perform sample size calculation required to detect significant differences between different levels of experience. Lastly, as feedback on performance is known to improve the learning process [33-36], and the structure of the GRSs allows feedback per skill domain, GRSs can also be valuable as educational tools.
The Basic Arthroscopic Knee Skill Scoring System (GRS I) and the Orthopaedic Competence Assessment Project (GRS II) demonstrate sufficient construct validity when performing knee arthroscopy or ACL reconstruction. However, they seem not sufficiently sensitive and consistent to establish individual learning curves. Both scales are suitable to objectively evaluate global progress of residents in the operating room when acquiring arthroscopic skills, in particular on group level.
The authors are grateful to Joris Lansdaal and Mikel Reilingh for their contribution to enable the GRS assessments, and Robert van de Broek for his contribution to the set up and the execution of this study. We also want to thank the Sloterva art Hospital in Amsterdam, the Tergooi Hospital in Hilversum for their cooperation.
Ahmed K, Miskovic D, Darzi A, Athanasiou T, Hanna GB (2011) Observational tools for assessment of procedural skills: a systematic review. Am J Surg 202: 469-480.
Poss R, Mabrey JD, Gillogly SD, Kasser JR, Sweeney HJ, et al. (2000) Development of a virtual reality arthroscopic knee simulator. J Bone Joint Surg Am 82-82A: 1495-1499.
Hobby J, Griffin D, Dunbar M, Boileau P (2007) Is arthroscopic surgery for stabilisation of chronic shoulder instability as effective as open surgery? A systematic review and meta-analysis of 62 studies including 3044 arthroscopic operations. J Bone Joint Surg Br 89: 1188-1196.
Koehler RJ, Amsdell S, Arendt EA, Bisson LJ, Braman JP, et al. (2013) The Arthroscopic Surgical Skill Evaluation Tool (ASSET). Am J Sports Med 41: 1229-1237.
Kim S, Bosque J, Meehan JP, Jamali A, Marder R (2011) Increase in outpatient knee arthroscopy in the United States: a comparison of National Surveys of Ambulatory Surgery, 1996 and 2006. J Bone Joint Surg Am 93: 994-1000.
Zhang YZ (2015) Innovations in Orthopedics and Traumatology in China. Chin Med J (Engl) 128: 2841-2842.
Insel A, Carofino B, Leger R, Arciero R, Mazzocca AD (2009) The development of an objective model to assess arthroscopic performance. J Bone Joint Surg Am 91: 2287-2295.
O'Neill PJ, CosgareaAJ, Freedman JA, Queale WS, McFarland EG (2002) Arthroscopic proficiency: a survey of orthopaedic sports medicine fellowship directors and orthopaedic surgery department chairs. Arthroscopy 18: 795-800.
Hodgins JL, Veillette C (2013) Arthroscopic proficiency: methods in evaluating competency. BMC Med Educ 13: 61.
Tuijthof GJ, Visser P, Sierevelt IN, Van Dijk CN, Kerkhoffs GM (2011) Does perception of usefulness of arthroscopic simulators differ with levels of experience? Clin Orthop Relat Res 469: 1701-1708.
Kairys JC, McGuire K, Crawford AG, Yeo CJ (2008) Cumulative operative experience is decreasing during general surgery residency: a worrisome trend for surgical trainees? J Am Coll Surg 206: 804-811.
Peabody T, Nestler S, Marx C, Pellegrini V (2012) Resident duty-hour restrictions-who are we protecting?: AOA critical issues. J Bone Joint Surg Am 94: e131.
Sonnadara RR, Van Vliet A, Safir O, Alman B, Ferguson P, et al. (2011) Orthopedic boot camp: examining the effectiveness of an intensive surgical skills course. Surgery 149: 745-749.
Mabrey JD, Gillogly SD, Kasser JR, Sweeney HJ, Zarins B, et al. (2002) Virtual reality simulation of arthroscopy of the knee. Arthroscopy 18: E28.
Bann S, Davis IM, Moorthy K, Munz Y, Hernandez J, et al. (2005) The reliability of multiple objective measures of surgery and the role of human performance. Am J Surg 189: 747-752.
Bann S, Kwok KF, Lo CY, Darzi A, Wong J (2003) Objective assessment of technical skills of surgical trainees in Hong Kong. Br J Surg 90: 1294-1299.
Ezra DG, Aggarwal R, Michaelides M, Okhravi N, Verma S, et al. (2009) Skills acquisition and assessment after a microsurgical skills course for ophthalmology residents. Ophthalmology 116: 257-262.
Faulkner H, Regehr G, Martin J, Reznick R (1996) Validation of an objective structured assessment of technical skill for surgical residents. Acad Med 71: 1363-1365.
MacRae H, Regehr G, Leadbetter W, Reznick RK (2000) A comprehensive examination for senior surgical residents. Am J Surg 179: 190-193.
Munz Y, Moorthy K, Bann S, Shah J, Ivanova S, et al. (2004) Ceiling effect in technical skills of surgical residents. Am J Surg 188: 294-300.
Kim J, Neilipovitz D, Cardinal P, Chiu M (2009) A comparison of global rating scale and checklist scores in the validation of an evaluation tool to assess performance in the resuscitation of critically ill patients during simulated emergencies (abbreviated as "CRM simulator study IB"). Simul Healthc 4: 6-16.
Ilgen JS, Ma IW, Hatala R, Cook DA (2015) A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment. Med Educ 49: 161-173.
Middleton RM, Baldwin MJ, Akhtar K, Alvand A, Rees JL (2016) Which Global Rating Scale? A Comparison of the ASSET, BAKSSS, and IGARS for the Assessment of Simulated Arthroscopic Skills. J Bone Joint Surg Am 98: 75-81.
Howells NR, Gill HS, Carr AJ, Price AJ, Rees JL (2008) Transferring simulated arthroscopic skills to the operating theatre: a randomised blinded study. J Bone Joint Surg Br 90: 494-499.
Hattie J, Cooksey RW (1984) Procedures for Assessing the Validities of Tests Using the "Known-Groups" Method. Appl Psy Meas 8: 295-305.
Campbell DT, Fiske DW (1959) Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull 56: 81-105.
Mason RO, LD, Marchal WG (1983) Statistics: An introduction. New York: Harcourt Brace Jovanovich.
Weber JC, LD (2012) Statistics and Research in Physical Education. St. Louis: CV Mosby Co.
Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 8476: 307-310.
Vogt VY, Givens VM, Keathley CA, Lipscomb GH, Summitt RL Jr (2003) Is a resident's score on a videotaped objective structured assessment of technical skills affected by revealing the resident's identity? Am J Obstet Gynecol 189: 688-691.
VanHeest A, Kuzel B, Agel J, Putnam M, Kalliainen L, et al. (2012) Objective structured assessment of technical skill in upper extremity surgery. J Hand Surg Am 37: 332-337.
Atesok K, Mabrey JD, Jazrawi LM, Egol KA (2012) Surgical simulation in orthopaedic skills training. J Am AcadOrthopSurg 20: 410-422.
Gomoll AH, O'Toole RV, Czarnecki J, Warner JJ (2007) Surgical experience correlates with performance on a virtual reality simulator for shoulder arthroscopy. Am J Sports Med 35: 883-888.
Price AJ, Erturan G, Akhtar K, Judge A, Alvand A, et al. (2015) Evidence-based surgical training in orthopaedics: how many arthroscopies of the knee are needed to achieve consultant level performance? Bone Joint J 97-97B: 1309-1315.
Harewood GC, Murray F, Winder S, Patchett S (2008) Evaluation of formal feedback on endoscopic competence among trainees: the EFFECT trial. Ir J Med Sci 177: 253-256.
O'Connor A, Schwaitzberg SD, Cao CG (2008) How much feedback is necessary for learning to suture? SurgEndosc 22: 1614-1619.