Machine learning-based prediction of mortality from clinical routine data in patients with cardiac ATTR-Amyloidosis – a feasibility study based on registry data from the Amyloidosis Center Lower Saxon

K. Pfeiffer-Koch (Hannover)1, J. Kösterke (Hannover)1, K. Werle (Hannover)2, S. Gingele (Hannover)3, T. Skripuletz (Hannover)3, D. Berliner (Hannover)1, J. Bauersachs (Hannover)1, A. Hänselmann (Hannover)1, U. Bavendiek (Hannover)1
1Medizinische Hochschule Hannover Kardiologie und Angiologie Hannover, Deutschland; 2Peter L. Reichertz Institut für Medizinische Informatik Datenintegrationszentrum Hannover, Deutschland; 3Medizinische Hochschule Hannover Klinik für Neurologie Hannover, Deutschland
Background: ATTR amyloidosis is a systemic disease affecting multiple organ systems, caused by misfolded transthyretin proteins aggregating in various tissues. Cardiac ATTR amyloidosis (ATTR-CM) is a progressive and restrictive cardiomyopathy. Untreated patients have a worse prognosis and suffer from worsening of heart failure resulting in hospitalisations and a progressive loss of quality of live. Various data are collected in everyday clinical practice. Traditional statistical methods reach their limits when evaluating such complex datasets. Machine learning–based approaches are suitable for making these data usable for clinical decision-making and research purposes. This feasibility study aims to identify predictors of mortality in patients with ATTR-CM from clinical routine data using machine‑learning methods.

Methods: A total of 112 patients with confirmed ATTR‑CM from the registry of the Amyloidosis Center of Lower Saxony and a follow-up period of 24 months were included in the study. 27 (24 %) patients died during follow‑up. To predict mortality, a random forest model (scikit‑learn package, Python) was trained on the baseline characteristics from clinical routine data obtained from the first outpatient visit in the Amyloidosis Center including demographic data, patient history, physical examination, electrocardiography, echocardiography, laboratory chemistry, and medication. The Model performance was assessed by calculating the area under the receiver operating characteristic curve (AUC) and accuracy, as well as the out‑of‑bag (OOB) score to estimate the performance in unknown populations. To gain insight into the model’s decisions, mean feature importance for each variable was calculated over 20 runs with random train–test splits.

Results: From initial 160 variables, 54 model parameters remained after correlation analysis (Spearman correlation combined with manual selection) for prediction of mortality. Variables with a mean feature importance above 5 % over 20 runs for prediction of mortality were identified. The model performance from a representative run is shown in Fig. 1A. The average accuracy for the prediction of mortality across 20 runs was 0.76, the AUC was 0.65, and the mean out‑of‑bag score was 0.78. The most important predictors of mortality at baseline were TAPSE, left ventricular outflow tract velocity time integral (LVOT VTI), E’ septal and cardiac Troponin T. The top ten features ranked by mean feature importance are presented in Figure 1B.

Conclusion: Machine learning-based prediction of mortality is feasible utilizing clinical routine data even in complex systemic diseases such as ATTR‑CM. Decision-tree-based algorithms seem to represent an important approach for evaluation, even in cohorts with smaller sample sizes. Before clinical implementation, further optimization of the hyperparameters and, if possible, the inclusion of additional data points from a second visit could improve the model’s performance. Systematic errors cannot be ruled out in retrospective analysis, therefore external validation and a prospective testing of the final model is essential.