Background: Despite rapid progress in artificial intelligence (AI), its clinical adoption in cardiology remains limited, due to labor-intensive data curation, suboptimal accuracy of traditional deep learning algorithms, and unclear implementation pathways. Foundation models, large neural networks trained on massive datasets, mark a major paradigm shift in AI. Their applications in healthcare, especially for cardiovascular magnetic resonance (CMR) imaging and diagnosis, are emerging. However, reliable automated AI tools for diagnosing cardiac disease from CMR images are still lacking. In this study, we developed and evaluated foundation model–based AI methods for robust and accurate cardiac disease classification from CMR images that could augment clinical decision-making and reduce diagnostic burden.
Methods: We developed an automated data curation pipeline using locally run open-source LLMs to extract diagnostic information from CMR reports. Multiple LLMs analyzed the text, and their outputs were combined to assign diagnostic labels for HCM, DCM, cardiac amyloidosis (CA), ischemic cardiomyopathy (ICM), and healthy controls (NOR). Corresponding short-axis (SA) late-gadolinium-enhancement (LGE), 4CH-cine, and SA-cine images were preprocessed and cropped around the heart using a nnU-Net model trained on publicly available data (Fig.1A). A total of 1643 cases were split into 1230 for training and 413 for validation. Three vision foundation models (DINO, VST, UMedPT) were fine-tuned across three imaging modalities, producing nine models for cardiac disease classification (Fig.1B). Model performance was evaluated on an independent test set of 1012 patients with expert-confirmed diagnoses, using a semi-automated pipeline to ensure accuracy and reproducibility.
Results: To evaluate the discriminative performance of the models, we computed the AUC-ROC for each diagnostic class. The best single-modality performances were achieved for the following categories: NOR (UMedPT–LGE, AUC = 0.865), HCM (DINO–SA, 0.940), CA (VST–LGE, 0.924), ICM (UMedPT–LGE, 0.795), and DCM (VST–SA, 0.810). To further enhance diagnostic accuracy and confidence, we applied an ensemble strategy by averaging predicted probabilities across models and modalities. Fig.1C illustrates the results: first row shows the model ensemble (averaging probabilities across modalities), second row shows the modality ensemble (averaging across different models), thrid row presents the full ensemble that combines all models and all modalities, achieving the highest overall diagnostic performance: NOR (AUC = 0.877), HCM (0.956), CA (0.916), ICM (0.766), DCM (0.839).
Conclusion: Foundation models adapted to multimodal CMR data can bridge the gap between AI development and clinical applicability in cardiac diagnosis. Automated data curation, fine-tuned vision models and ensemble strategies collectively enable scalable, accurate and clinically relevant AI systems for cardiovascular disease classification from CMR images, facilitating the integration of AI in clinical decision making.
