Background: The corrected QT (QTc) interval is an important electrocardiographic marker of ventricular repolarization whose prolongation is associated with life-threatening ventricular arrhythmias. Although commercial ECG-analysis systems have provided automated QTc measurements for decades, multiple studies have shown that they are error prone and often deviate from human expert readings. We hypothesized that a deep learning model trained on a large corpus of imperfect, machine-generated QTc measurements could average out their inconsistencies and deliver higher-precision QTc estimates than conventional algorithms.
Methods: For this study, five distinct 12-lead resting-ECG datasets were pooled. Two algorithm-labelled sets provided the development data: 60 150 records from an internal ECG database (ECGinternal), and 60 150 records from the MIMIC-IV-ECG database. A constant –15 ms shift was applied to all ECGinternal QTc values to remove vendor-specific over-estimation bias. Three expert-labelled datasets served for model refinement and testing: the publicly available PTB-Diagnostic ECG Database (445 ECGs for fine-tuning, 100 for testing), QTcinternal (210 ECGs, dual machine + expert annotation from our center), and ECGRDVQ (5 219 ECGs from a drug-provocation study). Expert QTc was recomputed with Bazett’s correction from cardiologist-marked QRS onset, T offset and RR intervals. The InceptionTime convolutional network architecture was modified for regression (global average pooling + linear output) and trained on both development datasets. The training configuration used L1 loss, the Adam optimizer (start lr = 1 × 10-3), Reduce-LR-on-Plateau scheduling, early stopping (patience = 10) and a 100-epoch cap. After convergence, weights were fine-tuned for 20 epochs (lr = 1 × 10-4) on 445 expert-annotated ECGs from the PTB training split to align the model with gold-standard delineations. Model performance was quantified with mean absolute error (MAE) and root-mean-squared error (RMSE) against expert QTc and benchmarked against the original machine measurements.
Results: When tested on the expert-labelled ECGs from three independent cohorts, the combined model nearly halved the error compared to off-the-shelf algorithmic measurements in all three datasets. On average, it reduced MAE from 23.4 ms to 13.4 ms and RMSE from 40.1 ms to 22.1 ms. A light fine-tune on 445 PTB ECGs moderately lowered the cross-cohort average to an MAE 12.8 ms and RMSE 21.0 ms while preserving external validity. Integrated-gradient saliency confirmed model focus on QRS onset, R-peaks and T-wave offsets, supporting physiologic plausibility.
Conclusion: The deep learning model significantly improved automated QTc interval estimation by leveraging large scale, imperfectly labelled ECG data. Despite being trained primarily on algorithm-generated annotations, the model consistently outperformed conventional machine outputs across diverse, expert-labelled test sets. Fine-tuning with a limited number of expert ECGs yielded further performance gains, demonstrating adaptability to specific clinical standards. These findings highlight the potential of noise-robust deep learning to enhance ECG interpretation in real-world settings, supporting safer and more reliable QTc assessment in clinical practice.