MA Marino, P Clauser, R Woitek, GJ Wengert, P Kapetas, M Bernathova, K Pinker-Domenig, TH Helbich, K Preidler, PAT Baltzer
PURPOSE: To investigate the impact of a scoring system (Tree) on inter-reader agreement and diagnostic performance in breast MRI reading. MATERIALS AND METHODS: This IRB-approved, single-centre study included 100 patients with 121 consecutive histopathologically verified lesions (52 malignant, 68 benign). Four breast radiologists with different levels of MRI experience and blinded to histopathology retrospectively evaluated all examinations. Readers independently applied two methods to classify breast lesions: BI-RADS and Tree. BI-RADS provides a reporting lexicon that is empirically translated into likelihoods of malignancy; Tree is a scoring system that results in a diagnostic category. Readings were compared by ROC analysis and kappa statistics. RESULTS: Inter-reader agreement was substantial to almost perfect (kappa: 0.643-0.896) for Tree and moderate (kappa: 0.455-0.657) for BI-RADS. Diagnostic performance using Tree (AUC: 0.889-0.943) was similar to BI-RADS (AUC: 0.872-0.953). Less experienced radiologists achieved AUC: improvements up to 4.7 % using Tree (P-values: 0.042-0.698); an expert's performance did not change (P = 0.526). The least experienced reader improved in specificity using Tree (16 %, P = 0.001). No further sensitivity and specificity differences were found (P > 0.1). CONCLUSION: The Tree scoring system improves inter-reader agreement and achieves a diagnostic performance similar to that of BI-RADS. Less experienced radiologists, in particular, benefit from Tree. KEY POINTS: • The Tree scoring system shows high diagnostic accuracy in mass and non-mass lesions. • The Tree scoring system reduces inter-reader variability related to reader experience. • The Tree scoring system improves diagnostic accuracy in non-expert readers.