Deep Learning Improves Automated QTc Estimation from 12-lead ECG

A. Büscher (Münster)1, L. Plagwitz (Münster)2, F. Doldi (Münster)1, J. Magerfleisch (Münster)1, M. Zotov (Münster)2, L. Bickmann (Münster)2, D. Heider (Münster)2, J. Varghese (Magdeburg)3, L. Eckardt (Münster)1
1Universitätsklinikum Münster Klinik für Kardiologie II - Rhythmologie Münster, Deutschland; 2Institut für Medizinische Informatik Münster, Deutschland; 3Institut für Medical Data Science Magdeburg, Deutschland
Background: The corrected QT (QTc) interval is an important electrocardiographic marker of ventricular repolarization whose prolongation is associated with life-threatening ventricular arrhythmias. Although commercial ECG-analysis systems have provided automated QTc measurements for decades, multiple studies have shown that they are error prone and often deviate from human expert readings. We hypothesized that a deep learning model trained on a large corpus of imperfect, machine-generated QTc measurements could average out their inconsistencies and deliver higher-precision QTc estimates than conventional algorithms.

Methods: For this study, five distinct 12-lead resting-ECG datasets were pooled. Two algorithm-labelled sets provided the development data: 60 150 records from an internal ECG database (ECGinternal), and 60 150 records from the MIMIC-IV-ECG database. A constant –15 ms shift was applied to all ECGinternal QTc values to remove vendor-specific over-estimation bias. Three expert-labelled datasets served for model refinement and testing: the publicly available PTB-Diagnostic ECG Database (445 ECGs for fine-tuning, 100 for testing), QTcinternal (210 ECGs, dual machine + expert annotation from our center), and ECGRDVQ (5 219 ECGs from a drug-provocation study). Expert QTc was recomputed with Bazett’s correction from cardiologist-marked QRS onset, T offset and RR intervals. The InceptionTime convolutional network architecture was modified for regression (global average pooling + linear output) and trained on both development datasets. The training configuration used L1 loss, the Adam optimizer (start lr = 1 × 10-3), Reduce-LR-on-Plateau scheduling, early stopping (patience = 10) and a 100-epoch cap. After convergence, weights were fine-tuned for 20 epochs (lr = 1 × 10-4) on 445 expert-annotated ECGs from the PTB training split to align the model with gold-standard delineations. Model performance was quantified with mean absolute error (MAE) and root-mean-squared error (RMSE) against expert QTc and benchmarked against the original machine measurements.

Results: When tested on the expert-labelled ECGs from three independent cohorts, the combined model nearly halved the error compared to off-the-shelf algorithmic measurements in all three datasets. On average, it reduced MAE from 23.4 ms to 13.4 ms and RMSE from 40.1 ms to 22.1 ms. A light fine-tune on 445 PTB ECGs moderately lowered the cross-cohort average to an MAE 12.8 ms and RMSE 21.0 ms while preserving external validity. Integrated-gradient saliency confirmed model focus on QRS onset, R-peaks and T-wave offsets, supporting physiologic plausibility.

Conclusion: The deep learning model significantly improved automated QTc interval estimation by leveraging large scale, imperfectly labelled ECG data. Despite being trained primarily on algorithm-generated annotations, the model consistently outperformed conventional machine outputs across diverse, expert-labelled test sets. Fine-tuning with a limited number of expert ECGs yielded further performance gains, demonstrating adaptability to specific clinical standards. These findings highlight the potential of noise-robust deep learning to enhance ECG interpretation in real-world settings, supporting safer and more reliable QTc assessment in clinical practice.