CanHealth AI Validation Study

Publication Figures — Vancouver Coastal Health Authority & Sapien Secure Inc.

Figure 1
Figure 1. AI accuracy by priority level before and after panel consensus review. The AI achieved 96.9% overall accuracy after review, with perfect performance (100%) on high-acuity P1/P2 cases.
Figure 2
Figure 2. Consensus match rates for AI and individual human expert evaluators (blinded phase). The AI achieved 55.1% consensus agreement, lower than all human experts (range: 64.9%–84.8%).
Figure 3
Figure 3. Distribution of AI error patterns after panel review (n=12). The majority of errors (58.3%) were over-triage classifications, where the AI assigned higher urgency than the panel consensus, reflecting a conservative safety bias.
Figure 4
Figure 4. AI performance under permissive versus strict criteria. The permissive criterion (green line) accepts any expert-endorsed answer, while the strict criterion (orange line) requires exact consensus match. The divergence demonstrates that most AI "errors" reflect alternative valid interpretations rather than clinical incorrectness.
Figure 5
Figure 5. Confusion matrix showing AI predictions versus panel consensus for priority classifications. Diagonal elements (green gradient) represent correct predictions; off-diagonal elements indicate misclassifications. Most errors occurred between adjacent priority levels.
Figure 6
Figure 6. Intra-rater variability measured by answer change rates on re-evaluation. Individual evaluators changed their priority assignments in 10.7%–61.7% of re-reviewed cases (overall: 39.3%), highlighting inherent human inconsistency in clinical triage decisions.