https://doi.org/10.1007/s00392-025-02625-4
1Universitätsmedizin Greifswald Institut für Bioinformatik Greifswald, Deutschland; 2Universitäres Herz- und Gefäßzentrum Hamburg Klinik für Kardiologie Hamburg, Deutschland; 3Universitätsmedizin Greifswald Klinik und Poliklinik für Innere Medizin B Greifswald, Deutschland
Introduction: Atrial fibrillation (AF) is the most common arrhythmia, leading to an increased risk of stroke, heart failure, and other cardiovascular complications. Traditional methods of AF prediction, based on clinical factors and biomarkers, have been enhanced by advances in machine learning (ML) and artificial intelligence (AI). Neural networks (NN) trained on electrocardiogram (ECG) data have shown potential in AF risk stratification. The Study of Health in Pomerania (SHIP) dataset provides a unique opportunity to validate AI/ML models in a large and diverse population. This analysis evaluates NN performance for AF prediction across clinical subgroups, focusing on comorbidities and characteristics that may influence model accuracy.
Methods: A recent NN model for AF prediction using single-lead ECGs was applied to estimate the 3-year risk of incident AF. The NN’s performance was validated using data from the SHIP cohort, including subgroup-specific analyses based on clinical variables such as age, body mass index (BMI), lipid profiles (HDL, LDL, total cholesterol), N-terminal prohormone of brain natriuretic peptide (NT-proBNP), cardiovascular comorbidities (e.g., hypertension, diabetes, stroke, and myocardial infarction), and medication use. Model performance was assessed for subgroup-specific differences, expressed as the area under the ROC curve (AUC) for incident AF after 5–7 years of follow-up in the SHIP START/TREND cohorts. Subjects with diagnosed AF at baseline were excluded.
Results: The predictive accuracy of the NN model varied across clinical subgroups in the SHIP-START and SHIP-TREND cohorts (N = 4,943; incident AF = 67). The model performed well in individuals with a history of myocardial infarction (AUC = 0.90 vs. 0.78) and in those with well-controlled cholesterol levels, such as LDL < 70 mg/dL (AUC = 0.90 vs. 0.78) and LDL < 100 mg/dL (AUC = 0.86 vs. 0.76). However, accuracy was lower in hypertensive individuals (AUC = 0.75 vs. 0.94) and stroke patients (AUC = 0.68 vs. 0.80). NT-proBNP levels also influenced performance, with higher accuracy for NT-proBNP ≥ 125 pg/mL (AUC = 0.78 vs. 0.69) and lower accuracy for NT-proBNP ≥ 300 pg/mL (AUC = 0.68 vs. 0.76).
Conclusions: This analysis highlights the potential of NN models for AF prediction across diverse clinical subgroups. The findings emphasize the importance of subgroup-specific validation to address clinical variability. While the model demonstrated strong performance in some high-risk groups, such as those with myocardial infarction, its accuracy varied across others, indicating the need for further refinement. Integrating these AI models into clinical practice, with ongoing improvements, could enhance personalized AF risk prediction and might support better early detection and outcomes.

Figure 1: AUC values for ECG-based neural network prediction of incident atrial fibrillation applied to different groups in the pooled SHIP-START and SHIP-TREND cohorts.
Figure 1: AUC values for ECG-based neural network prediction of incident atrial fibrillation applied to different groups in the pooled SHIP-START and SHIP-TREND cohorts.