In-Depth · MammoComply Knowledge Base
The Breast MRI Medical Outcome Audit: What It Is, How It Works, and Why It Requires a Different Methodology Than Mammography
“Breast MRI is the most sensitive imaging modality available for breast cancer detection. Yet unlike mammography, breast MRI outcome auditing carries no federal mandate — and ACR accreditation expectations are not the same as MQSA requirements. Understanding that distinction is the starting point for any program that takes quality accountability seriously.”
In This Article
- 1.Key Takeaways
- 2.Introduction
- 3.The Regulatory Foundation — and Its Limits
- 4.Why Breast MRI Auditing Is Structurally Different
- 5.The Three-Stream Audit Architecture
- 6.Stream 1: Screening MRI
- 7.Stream 2: Diagnostic MRI
- 8.Stream 3: Preoperative / Extent-of-Disease MRI
- 9.Reading the Metrics: An Illustrative Example
- 10.Sample Documents
- 11.Why Statistical Interpretation Context Is Non-Optional
- 12.A Tiered Approach to Implementation
- 13.Glossary
- 14.Conclusion
- 15.References
Key Takeaways
Breast MRI outcome auditing is not required by MQSA, but ACR-accredited breast MRI facilities are expected to maintain a medical outcomes audit program. These are not the same obligation.
Three clinically distinct MRI indication streams — screening, diagnostic, and preoperative — must be audited independently. Pooling them produces metrics that are uninterpretable against any published benchmark.
The definition of a “positive” examination differs by indication: BI-RADS® 3 is positive in screening MRI, but negative in diagnostic MRI. Misapplying this definition distorts every recall-based metric.
The formal ACR BI-RADS® benchmark for screening MRI CDR is 20–30 per 1,000 examinations. The BCSC community-observed rate is 17 per 1,000. These are different reference values and should be presented separately.
Small examination volumes produce statistically unstable metrics that can appear severely out of range while reflecting sample size, not interpretive performance. Approximately 500–1,000 screening MRI examinations are required before benchmark comparison is informative.
Diagnostic MRI performance reference ranges are not formal ACR BI-RADS® benchmarks in the same sense as screening MRI thresholds. They are published reference values from community-practice studies and should be labeled as such.
Introduction
The mammography medical outcome audit is one of the most thoroughly regulated quality assurance processes in diagnostic imaging. Under the Mammography Quality Standards Act (MQSA), every certified mammography facility in the United States is required to conduct an annual medical outcome audit, link imaging interpretations to pathologically confirmed outcomes, calculate a defined set of key performance indicators (KPIs), and compare those results against published benchmarks. Federal inspectors verify compliance annually.
Breast MRI operates under a different framework — one that is often misunderstood in two directions simultaneously. Some programs assume that because MQSA does not mandate breast MRI auditing, no formal audit expectation exists. Others apply mammography audit methodology directly to MRI data, producing metrics that are structurally wrong for the population being screened and the modality being evaluated. Neither approach serves the purpose of quality accountability.
This article explains the methodology for conducting a rigorous breast MRI medical outcome audit: what it measures, how it differs from the mammography audit at the level of methodology rather than just regulatory obligation, how three distinct clinical indication streams must be approached separately, and what clinicians and program administrators need to understand in order to interpret audit results correctly.
The Regulatory Foundation — and Its Limits
The MQSA, enacted by Congress in 1992, established federal quality standards for mammography. The 2023 Final Rule (Federal Register, March 10, 2023; 88 FR 15126) is the most significant update to those standards since 1997 — overhauling imaging assessment terminology, expanding density reporting requirements, and strengthening enforcement mechanisms. It did not change one fundamental parameter: the MQSA applies exclusively to mammographic imaging. The statute does not provide for the establishment of requirements related to breast MRI, and the regulations have not been amended to include MRI. [7]
However, the absence of an MQSA mandate does not mean that breast MRI facilities operate in a quality accountability vacuum. The ACR Breast MRI Accreditation Program expects each accredited facility to establish and maintain a medical outcomes audit program to follow positive assessments and correlate pathology results with interpreting physician findings. That expectation is not a federal inspection requirement — it is an accreditation standard — and the distinction matters:
MQSA
Federal law, enforced by annual FDA inspection, applies exclusively to mammographic imaging. Non-compliance carries federal consequences.
ACR Breast MRI Accreditation
A professional accreditation program that carries an outcomes audit expectation as a program standard. Enforcement is through accreditation standing, not federal inspection.
The ACR BI-RADS® Atlas, 5th Edition (Sickles & D’Orsi, 2013) extended the outcome monitoring framework to breast MRI, establishing cross-modality KPI definitions and recommending outcome monitoring as a best practice. [1] The Breast Cancer Surveillance Consortium (BCSC) has published the only large-scale community benchmark data for screening breast MRI (Lee JM et al., Radiology, 2017; n = 8,387 examinations). [2] Together, these sources form the evidence base for breast MRI outcome auditing.
Why Breast MRI Auditing Is Structurally Different From Mammography Auditing
It is tempting to treat the breast MRI audit as a version of the mammography audit applied to a different modality. It is not. There are five structural differences that make the breast MRI audit materially more complex.
The Screening Population Has a Fundamentally Higher Cancer Prevalence
This distinction is not administrative. It is the reason breast MRI CDR benchmarks (20–30 per 1,000 per BI-RADS®) are not comparable to mammography CDR benchmarks (~5–6 per 1,000). The underlying cancer prevalence in the population being screened is structurally higher.
The Definition of a "Positive" Screening Examination Differs by Modality
This directly affects the calculation of Abnormal Interpretation Rate, Recall Rate, and PPV1. A practice applying the mammographic positive assessment definition to its MRI audit will systematically miscalculate every recall-based metric.
The Positive Assessment Definition Reverses in Diagnostic MRI
A facility using a single positive assessment definition across both screening and diagnostic populations will produce metrics that are wrong for both. The peer-reviewed literature is explicit: “performance measures differ significantly between screening and diagnostic MRI indications and must be calculated separately.” [3]
Three Distinct Clinical Indication Streams Must Never Be Combined
Lee CI et al. (Academic Radiology, 2014), using BCSC data across 11,654 breast MRI examinations, demonstrated that AIRs differed significantly across indication categories and concluded that “practices should stratify breast MRI examinations by indication for quality assurance and auditing purposes.” [4]
The Benchmark Evidence Base Is Comparatively Thin
Statistical interpretation context is therefore not optional boilerplate — it is a clinical necessity.
The Three-Stream Audit Architecture
A methodologically sound breast MRI medical outcome audit is organized around three independent streams — each with its own examination population, positive assessment definition, outcome assignment window, and benchmark or reference framework. No cross-stream pooling is performed.
Stream 1
Screening MRI — The Current Standard of Practice
Examination Population
Asymptomatic women at elevated lifetime risk
Positive Assessment
BI-RADS® 0, 3, 4, or 5 (BI-RADS® 3 is positive)
Outcome Window
Tissue diagnosis within 12 months
Key Performance Indicators — Two-Tier Benchmark Framework
An essential discipline in screening MRI audit reporting is maintaining the distinction between the formal ACR BI-RADS® benchmark range and the BCSC community-observed performance value. These are different reference standards. The BI-RADS® range represents the published target; the BCSC observed value represents what community practice actually achieves. Both values belong in an audit report; neither should substitute for the other.
| Performance Measure | Formula | BI-RADS® Benchmark (Formal) | BCSC Community Observed |
|---|---|---|---|
| Recall Rate / AIR | Positive exams ÷ Total exams × 100 | 10% – 25% | ~12–16% |
| PPV1 | TP ÷ (TP + FP1) × 100 | 3% – 8% | ~10–15% |
| CDR | TP ÷ Total exams × 1,000 | 20–30 per 1,000 | 17 per 1,000 (95% CI 15–20) |
| Sensitivity | TP ÷ (TP + FN) × 100 | ≥ 80% | 81% (95% CI 75–86%) |
| Specificity | TN ÷ (TN + FP) × 100 | 85% – 90% | 83% (95% CI 82–84%) |
Sources: ACR BI-RADS® Atlas, 5th Edition [1]; Lee JM et al., Radiology, 2017 [2].
Benchmark Interpretation Note
The BCSC community-observed CDR of 17 per 1,000 falls below the formal BI-RADS® benchmark range of 20–30 per 1,000. This reflects performance in a mixed-risk community registry population. Programs should not use 17 per 1,000 as a benchmark floor, nor should they present an internal action threshold below 20 per 1,000 as equivalent to the BI-RADS® range. Similarly, the BCSC observed PPV1 of ~10–15% and specificity of 83% fall outside the formal BI-RADS® ranges — an expected consequence of population mix effects in community data that reinforces the need for explicit interpretation context in every audit report.
Stream 2
Diagnostic MRI — A Different Population, Different Rules
Examination Population
Problem-solving, symptom evaluation, short-interval follow-up
Positive Assessment
BI-RADS® 4 or 5 only — BI-RADS® 3 is negative here
Outcome Window
Tissue diagnosis within 12 months
Published Reference Values for Diagnostic MRI
Labeling Note
The following are published reference values and practical comparison ranges drawn from community-practice studies. [3, 4] They are not formal ACR BI-RADS® benchmark thresholds in the sense that the screening MRI values above are. No published equivalent of the BI-RADS® screening benchmark table exists for diagnostic MRI. Programs should label these as reference values — not benchmarks — when presenting audit results.
| Performance Measure | Formula | Published Reference Value | Source |
|---|---|---|---|
| PPV2 | TP ÷ (TP + FP2) × 100 | 20% – 40% | Niell BL et al. [3] |
| PPV3 | TP ÷ (TP + FP3) × 100 | 25% – 45% | Niell BL et al. [3] |
| CDR | TP ÷ Total exams × 1,000 | ~47 per 1,000 * | Niell BL et al. [3] |
| Sensitivity | TP ÷ (TP + FN) × 100 | ≥ 80% | Lee CI et al. [4] |
| Specificity | TN ÷ (TN + FP) × 100 | 85% – 90% | Lee CI et al. [4] |
* Diagnostic CDR of ~47 per 1,000 reported by Niell et al. [3] is reference only; no formal BI-RADS® diagnostic CDR benchmark has been published. This value reflects the substantially higher cancer prevalence in diagnostic populations and should not be compared to screening CDR.
Stream 3
Preoperative / Extent-of-Disease MRI — The Emerging Third Stream
Examination Population
Women with confirmed breast cancer — extent-of-disease evaluation
Positive Assessment
BI-RADS® 4 or 5; contralateral occult cancer detection
Outcome Window
Pathological correlation at surgical excision
Benchmark Status
The ACR BI-RADS® v2025 Manual formally introduces audit guidance for preoperative breast MRI for the first time. [6] Cohen EO et al. (Radiology, 2025) established the feasibility of this framework in community practice, reporting preliminary performance data: AIR 30.3%, PPV2 22.8%, PPV3 32.2%, and contralateral cancer detection rate 90.7 per 1,000. [6] Formal benchmark ranges for this stream remain under active development.
Malignancy is known to be present in the ipsilateral breast at the time of examination. This makes sensitivity and CDR calculations for ipsilateral disease structurally incomparable to surveillance or problem-solving contexts. Including preoperative examinations in a screening or diagnostic audit pool invalidates every calculated metric for its intended purpose.
Reading the Metrics: An Illustrative Example
The following example uses constructed data for a hypothetical facility — Anywhere Breast Imaging Practice — to illustrate how Stream 1 (Screening MRI) results are calculated and contextualized against the two-tier benchmark framework. All data are illustrative.
Underlying screening counts: 44 abnormal interpretations (BI-RADS® 0/3/4/5) of 297 total; 7 screen-detected cancers (TP); 1 interval cancer (FN); 37 false positive recalls (FP1); 253 true negatives (TN).
Screening Stream Results
| KPI | Calculated Value | BI-RADS® Benchmark | BCSC Observed | Status |
|---|---|---|---|---|
| Recall Rate | 14.8% | 10% – 25% | ~12–16% | ✓ Within Range |
| AIR | 14.8% | 10% – 25% | ~12–16% | ✓ Within Range |
| PPV1 | 15.9% | 3% – 8% | ~10–15% | ↑ Above BI-RADS® Range |
| CDR | 23.6 / 1,000 | 20–30 / 1,000 | 17 / 1,000 | ✓ Within Range |
| Sensitivity | 87.5% | ≥ 80% | 81% | ✓ Within Range |
| Specificity | 87.2% | 85% – 90% | 83% | ✓ Within Range |
Interpretation Notes
CDR at 23.6 per 1,000 — Within BI-RADS® Range
A screening CDR of 23.6 per 1,000 falls within the formal BI-RADS® benchmark range of 20–30 per 1,000 and exceeds the BCSC community-observed rate of 17 per 1,000. Both reference values should be documented: the BI-RADS® range provides the formal target; the BCSC observed rate provides community context. This program is detecting cancer at a rate consistent with the BI-RADS® expectation.
PPV1 at 15.9% — Above BI-RADS® Range
A PPV1 of 15.9% exceeds the formal BI-RADS® benchmark of 3–8% and also falls above the BCSC community-observed range of ~10–15%. In isolation, this might suggest over-calling. In the context of a CDR within the BI-RADS® range, it reflects a different and clinically important finding: a higher proportion of abnormal interpretations in this practice correspond to true malignancy. The BI-RADS® PPV1 benchmark was derived from BCSC community data across mixed risk strata; in a program with a high CDR, a higher PPV1 is an expected mathematical consequence, not a quality deficiency. This finding should be documented in QA committee review and interpreted alongside CDR rather than in isolation.
Sensitivity at 87.5%
Based on 8 total cancers (7 screen-detected, 1 interval), this metric meets the ≥80% benchmark. At this volume, the confidence interval is wide. Trend analysis across periods is more informative than single-period point estimates at this sample size.
Specificity at 87.2%
Falls within the BI-RADS® 85–90% range and above the BCSC community-observed value of 83%, indicating the false-positive burden on non-cancer patients is within expected norms for a high-risk screening program.
Diagnostic Stream Reference Values (Constructed, n = 84)
| KPI | Calculated Value | Published Reference Value | Status vs. Reference |
|---|---|---|---|
| PPV2 | 22.2% | 20% – 40% | Within Reference Range |
| PPV3 | 27.3% | 25% – 45% | Within Reference Range |
| Sensitivity | 85.7% | ≥ 80% | Within Reference Range |
| Specificity | 72.7% | 85% – 90% | ↓ Below Reference Range |
Diagnostic Specificity at 72.7% — Below Reference Range: A diagnostic specificity below 85% indicates a higher-than-reference rate of biopsy recommendations not yielding cancer. At n = 84, this metric is statistically unstable and should be interpreted cautiously. Factors to investigate include the proportion of short-interval follow-up examinations within the diagnostic pool, whether second-look ultrasound correlation routinely precedes biopsy recommendation, and the pre-test probability characteristics of the referral mix. This finding identifies a parameter for targeted QA review, not a conclusion about interpretive performance.
Sample Documents
The following sample documents correspond to the Anywhere Breast Imaging Practice illustrative example above. Download to see what the methodology overview and completed screening audit report look like in practice. All data, facility names, and physician identifiers are constructed.
Breast MRI MOA — Methodology Overview
Anywhere Breast Imaging Practice · Sample methodology overview document
Sample document — illustrative purposes only. All data, facility names, and physician identifiers are constructed and do not represent any real patient, facility, or physician.
Having trouble viewing this sample? Check back later or contact Mammologix using the link below.
Breast MRI MOA — Screening Audit 2025
Anywhere Breast Imaging Practice · Sample completed screening audit report, January–December 2025
Sample document — illustrative purposes only. All data, facility names, and physician identifiers are constructed and do not represent any real patient, facility, or physician.
Having trouble viewing this sample? Check back later or contact Mammologix using the link below.
Why Statistical Interpretation Context Is Non-Optional
The breast MRI benchmark evidence base is comparatively thin. The only large-scale community screening MRI benchmark data come from a single BCSC publication covering fewer than 9,000 examinations. [2] Individual facility volumes are frequently far smaller, particularly in programs that correctly separate indication streams.
A screening program with 17 examinations and zero screen-detected cancers will report a CDR of 0.0 per 1,000, a sensitivity of 0%, and a PPV1 of 0%. None of these values indicate program failure. They indicate that the program has not yet accumulated sufficient volume to detect the expected number of cancers given the underlying prevalence and benchmark CDR. Approximately 500–1,000 screening MRI examinations are required before these metrics begin to stabilize to the point where benchmark comparison is informative. [2, 5]
Every breast MRI audit report should include interpretation notes that:
- 1Flag low-volume periods and identify which metrics are statistically unstable
- 2Distinguish formal BI-RADS® benchmark ranges from BCSC community-observed values — and present both
- 3Explain why screening and diagnostic results are reported separately and cannot be compared directly
- 4Identify diagnostic reference ranges as published reference values, not formal BI-RADS® benchmarks
- 5Cite the primary source for every reference threshold presented
- 6Provide clinical context for any metric outside the acceptable or reference range
A Tiered Approach to Implementation
Not every breast MRI program is at the same stage of development. The three-stream audit architecture is most useful when understood as a tiered structure — one that a program can enter at the level appropriate to its current volume, indication mix, and data infrastructure.
Stream 1 — Screening MRI
The logical starting point. Most established benchmark framework, most clinically homogeneous patient population, most directly interpretable performance metrics. Establish the screening stream first, ensure indication separation is captured in the tracking workflow, and allow sufficient volume to accumulate before drawing conclusions from individual-period metrics.
Stream 2 — Diagnostic MRI
Requires the ability to distinguish diagnostic from screening examinations at the data level. The reference value evidence base is thinner, and interpretation requires familiarity with the lower specificity expectations of a symptomatic population and the labeling distinction between formal benchmarks and published reference values. Best activated once Stream 1 is producing stable results.
Stream 3 — Preoperative MRI
The most recently formalized stream, with guidance now provided by the ACR BI-RADS® v2025 Manual. Facilities performing preoperative MRI should consider establishing this stream as v2025 benchmarks are published.
The governing principle: audit reporting should not proceed faster than the evidence base and the data infrastructure can support rigorous interpretation.
Glossary
- AIR (Abnormal Interpretation Rate)
- The percentage of examined patients with a positive (abnormal) imaging assessment. For screening MRI, positive = BI-RADS® 0, 3, 4, or 5. For diagnostic MRI, positive = BI-RADS® 4 or 5.
- BI-RADS®
- Breast Imaging Reporting and Data System. The ACR-developed standardized lexicon and reporting framework for breast imaging. The 5th Edition (2013) includes outcome monitoring guidance for all breast imaging modalities including MRI.
- BCSC
- Breast Cancer Surveillance Consortium. An NCI-supported network of breast imaging registries providing population-level performance data. The BCSC screening MRI benchmark study (Lee JM et al., 2017) is the primary source of community-observed performance values for screening breast MRI.
- CDR (Cancer Detection Rate)
- The number of cancers confirmed by tissue diagnosis per 1,000 examinations, identified as positive on the audited imaging examination.
- FN (False Negative)
- An examination interpreted as negative in which cancer is subsequently diagnosed within 12 months. In the screening context, also called an interval cancer.
- FP (False Positive)
- An examination interpreted as positive (abnormal) in which cancer is not confirmed on subsequent workup. Three variants: FP1 (recall level), FP2 (biopsy recommendation level), FP3 (biopsy performed level).
- MQSA
- Mammography Quality Standards Act (1992, as amended 2023). Federal legislation establishing quality standards for mammography in the United States. Does not apply to breast MRI.
- PPV1
- Positive Predictive Value 1. The percentage of abnormal interpretations (recall level) resulting in a tissue diagnosis of cancer within 12 months. TP ÷ (TP + FP1) × 100.
- PPV2
- Positive Predictive Value 2. The percentage of biopsy recommendations (BI-RADS® 4 or 5) resulting in a tissue diagnosis of cancer. TP ÷ (TP + FP2) × 100.
- PPV3
- Positive Predictive Value 3. The percentage of biopsies actually performed resulting in a tissue diagnosis of cancer. TP ÷ (TP + FP3) × 100.
- Sensitivity
- The percentage of true cancers correctly identified as positive. TP ÷ (TP + FN) × 100.
- Specificity
- The percentage of non-cancer examinations correctly identified as negative. TN ÷ (TN + FP) × 100.
- TN (True Negative)
- An examination correctly interpreted as negative — no cancer identified, none diagnosed within the outcome assignment window.
- TP (True Positive)
- An examination correctly interpreted as positive — cancer confirmed by tissue diagnosis within the outcome assignment window.
Conclusion
The breast MRI medical outcome audit is neither a simplified version of the mammography audit nor a complexity best deferred indefinitely. It is a structured quality assurance process that requires understanding of three distinct clinical indication streams, careful application of BI-RADS® audit methodology, an honest engagement with a comparatively thin benchmark evidence base, and a disciplined commitment to presenting both formal benchmark ranges and observed community performance values alongside every metric.
The regulatory picture is nuanced: no federal MQSA mandate exists for breast MRI auditing, but ACR-accredited breast MRI facilities carry an accreditation-level expectation for outcomes monitoring. Programs that understand that distinction — and design their audit methodology accordingly — are better positioned to produce results that are clinically meaningful and institutionally defensible.
Facilities that conduct breast MRI outcome audits with methodological rigor, present results with full statistical context, and maintain honest distinctions between formal benchmarks and published reference values provide the basis for meaningful quality improvement, meaningful physician feedback, and meaningful accountability to the patients their programs exist to serve.
References
- [1]Sickles EA, D’Orsi CJ. ACR BI-RADS® Follow-up and Outcome Monitoring. In: ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System, 5th Edition. American College of Radiology; Reston, VA; 2013:21–31.
- [2]Lee JM, Ichikawa L, Valencia E, et al. Performance Benchmarks for Screening Breast MR Imaging in Community Practice. Radiology. 2017;285(1):44–52. doi:10.1148/radiol.2017162033
- [3]Niell BL, Gavenonis SC, Motazedi T, et al. Auditing a Breast MRI Practice: Performance Measures for Screening and Diagnostic Breast MRI. J Am Coll Radiol. 2014;11(9):883–889. doi:10.1016/j.jacr.2014.02.003
- [4]Lee CI, Ichikawa L, Rochelle MC, et al. Breast MRI BI-RADS® Assessments and Abnormal Interpretation Rates by Clinical Indication in US Community Practices. Acad Radiol. 2014;21(11):1370–1376. doi:10.1016/j.acra.2014.06.003
- [5]Lam DL, Lee JM. Breast Magnetic Resonance Imaging Audit: Pitfalls, Challenges, and Future Considerations. Radiol Clin North Am. 2021;59(1):57–65. doi:10.1016/j.rcl.2020.09.002
- [6]Cohen EO, Tso HH, Shin K, et al. Feasibility of Auditing Preoperative Breast MRI for Extent-of-Disease Evaluation Using the BI-RADS® v2025 Manual. Radiology. 2025;317(1):e243803. doi:10.1148/radiol.243803 Erratum in: Radiology. 2025;317(1):e259018.
- [7]Federal Register. Mammography Quality Standards Act — Final Rule. March 10, 2023; 88 FR 15126.
- [8]Strigel RM, Rollenhagen J, Burnside ES, et al. Screening Breast MRI Outcomes in Routine Clinical Practice: Comparison to BI-RADS® Benchmarks. Acad Radiol. 2017;24(4):411–417. doi:10.1016/j.acra.2016.10.014
All benchmark values and methodology descriptions are referenced to their primary peer-reviewed sources as cited above. Facilities should consult the ACR BI-RADS® Atlas, 5th Edition and applicable professional society guidance when designing or reviewing a breast MRI outcome audit program.
Related Resources
About the Author
B.S., ARRT · President & Founder, Mammologix · Breast Imaging Operations since 1995
A registered radiologic technologist and founder of Mammologix, Rick Lippert has spent more than 30 years in breast imaging operations — spanning mammography medical outcome audit, MQSA compliance support, patient follow-up communication, and the operational systems that help facilities maintain quality accountability.
View full bio →ACR BI-RADS® Trademark Notice
BI-RADS® is a registered trademark of the American College of Radiology (ACR). The ACR BI-RADS® Atlas and all related benchmark values, assessment category definitions, and audit methodology guidance are the intellectual property of the American College of Radiology. All benchmark values and audit methodology references in this article are attributed to their original ACR and peer-reviewed sources. Mammologix is not affiliated with, endorsed by, or sponsored by the American College of Radiology. Reference to BI-RADS® is made solely for informational and educational purposes.