Background: Implantable cardioverter-defibrillators (ICDs) are effective for preventing sudden cardiac death (SCD) by delivering anti-tachycardic pacing (ATP) or shocks for ventricular tachycardia (VT) or fibrillation (VF) in at-risk patient populations. However, declining SCD rates due to improved medical therapy have led to many patients receiving ICDs without clear benefit. This underscores the need for better patient selection to avoid unnecessary implantations and related complications. Conventional stratification using clinical parameters remains limited. Recent advances in machine learning have enabled extraction of novel features from electrocardiograms (ECGs), offering new potential for individualized arrhythmia risk prediction. We aimed to assess whether targeted training on median beats using linear models could provide interpretable insights into latent electrocardiographic patterns associated with arrhythmic susceptibility.
Methods: A total of n=401 12-lead ECGs (338 unique patients presenting for ICD implantation for primary or secondary prevention, mean age 56±16 years, 76% male, 35% ischemic cardiomyopathy, 79% primary prevention) were included into this study. Each ECG met the following criteria: (1) the corresponding ICD implant date occurred within 30 days after the ECG recording and (2) follow-up data were available for a period exceeding two years. The clinical endpoint was appropriate ICD therapy with ATP or shock for sustained VT or VF (n=89 after a median follow-up period of 5 years). All signals were transformed into a median-beat representation using the Rlign algorithm, and the independent leads I, II, and V1-V6 were used to train a logistic regression model. Internal validation was performed using 5-fold stratified cross-validation (CV), reporting the area under the receiver operating characteristic curve (ROC-AUC), precision, recall, and F1-score. To prevent identity confounding, the CV was implemented at the patient level. For each ECG, the model produced an estimated event probability based on the CV predictions. These probabilities were subsequently dichotomized using a threshold of 0.5 and used as input for the Kaplan-Meier analysis and the log-rank test.
Results: The logistic regression on median beats achieved a cross-validated test ROC-AUC of 72.30% (± 5.24), significantly exceeding chance level (50%). The model reached an F1-score of 38.29% (± 10.99), with a precision of 44.69% (± 10.74) and a recall of 33.79% (± 11.37). Kaplan-Meier analysis demonstrated a significant separation of the predicted high- and low-risk groups, as confirmed by the log-rank test (p < 0.0001).
Conclusion: Our findings indicate that standard pre-implant ECGs contain electrophysiological information associated with future occurrences of ventricular arrhythmia. Using logistic regression directly on aligned median beats, we demonstrate that subtle signal characteristics can be captured and leveraged for outcome prediction, even in a relatively small cohort. This suggests that the resting ECG, beyond conventional interpretation, may reflect underlying arrhythmic susceptibility relevant for patient selection for ICD therapy. Further studies on larger datasets are warranted to validate these observations and to explore integration of ECG-derived representations with clinical risk factors for improved risk stratification of ICD candidates.