Standardization and AI-readiness across 5 international heart failure datasets to support the development of AI models: an iCARE4CVD use case

M. Verket (Aachen)1, C. Peters (Maastricht)2, H. Ghaem Sigarchian (Geel)3, E. Zilonova (London)4, S. Kabak (Maastricht)2, M. Colombo (Milan)5, A. Henderson (Glasgow)6, N. Pavo (Wien)7, N. Krautenbacher (Penzberg)8, W. Wei (Maastricht)9, A. A. Voors (Groningen)10, M. Huelsmann (Wien)7, R. Latini (Milan)11, D. Müller-Wieland (Aachen)1, H.-P. Brunner-La Rocca (Maastricht)12
1Uniklinik RWTH Aachen Med. Klinik I - Kardiologie, Angiologie und Internistische Intensivmedizin Aachen, Deutschland; 2Maastricht University Medical Centre Department of Cardiology Maastricht, Niederlande; 3Thomas More Care and Well-Being – Research Group Mobilab & Care Geel, Belgien; 4Novo Nordisk Digital Biology, AI & Digital Innovation, London, Großbritannien; 5Istituto di Ricerche Farmacologiche Mario Negri Dipartimento di Ricerca Danno Cerebrale e Cardiovascolare Acuto Milan, Italien; 6University of Glasgow School of Cardiovascular & Metabolic Health Glasgow, Großbritannien; 7Medizinische Universität Wien Innere Medizin II / Kardiologie Wien, Österreich; 8Roche Diagnostics GmbH Penzberg, Deutschland; 9Maastricht University Institute of Data Science Maastricht, Niederlande; 10University Medical Center Groningen Department of Echocardiography Groningen, Niederlande; 11IRCCS Istituto Neurologico Department of Cardiovascular Research Milan, Italien; 12Maastricht University Medical Center Maastricht, Niederlande
Background: The integration of diverse heart failure (HF) datasets offers the potential to enhance predictive modelling and personalized treatment strategies. However, heterogeneity in data structure, terminology, and medication representation poses significant challenges for pooled analyses and artificial intelligence (AI) model development. Purpose: To standardise distinct HF datasets by mapping clinical variables and transforming medication data, thereby creating a unified, high-quality dataset suitable for AI-driven predictive modelling. Methods: 5 HF datasets, Aachen-HF(DE), Biostat (NL), TIME-HF (CH), GISSI-HF (IT), and Vienna-HF (AT), were used. Each dataset included demographic, clinical, laboratory, and treatment variables. To allow for AI-based modeling, a standardization pipeline. with codings standards (SNOMED-CT, LOINC, ATC) was implemented by developing metadata dictionaries. Custom mapping were developed to address missing standardisation and legacy coding. HF medication mapping and transformation to a percentage of the target dose was aligned with 2021 ESC HF Guidelines, ensuringconsistency in the definitions, classification, and target dose of HF therapies across datasets. This percentage was pooled into one variable per medication class.Results: More than 80 feature variables from 5181 patients with 25982 records were identified to be harmonised between the HF datasets. 5 medications, beta blockers, RAS inhibitors, diuretics, and MRAs, were transformed to the daily target doses for each patient. Signs and symptoms of HF, such as edema, NYHA, orthopnoea, were included. Additionally, comorbidities, such as diabetes, kidney disease and COPD were identified as feature variables. Conclusion: Standardisation and transformation of HF medication data across multiple, heterogeneous datasets is critical for developing clinically relevant AI models. This process yielded a robust, standardized dataset with coherent definitions of demographic, laboratory and treatment variables, particularly those reflecting guideline-directed HF therapies. By bridging the data variability across cohorts and successfully reuse older datasets, this work lays the foundation for AI-driven precision HF medicine.