Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021).
Kakarmath, S. et al. Best practices for authors of healthcare-related artificial intelligence manuscripts. npj Digital Med. 3, 134 (2020).
Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur. Heart J. 35, 1925–1931 (2014).
Van Calster, B. et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 17, 230 (2019).
Vickers, A. J., Van Calster, B. & Steyerberg, E. W. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 352, i6 (2016).
Harrell, F. Multivariable modeling strategies. In: Regression Modeling Strategies. Springer Series in Statistics. (Springer, Cham., 2015).
Steyerberg, E. W. Clinical prediction models (Springer Nature, 2009).
Efron, B. & Tibshirani, R. J. An introduction to the bootstrap (CRC press, 1994).
Futoma, J., Simons, M., Panch, T., Doshi-Velez, F. & Celi, L. A. The myth of generalisability in clinical research and machine learning in health care. Lancet Digital Health 2, e489–e492 (2020).
Wan, B., Caffo, B. & Vedula, S. S. A unified framework on generalizability of clinical prediction models. Front. Artif. Intell. 5, https://doi.org/10.3389/frai.2022.872720 (2022).
de Hond, A. A. H. et al. Predicting readmission or death after discharge from the ICU: external validation and retraining of a machine learning model. Crit. Care Med. 51, 291–300 (2023).
Austin, P. C. et al. Geographic and temporal validity of prediction models: different approaches were useful to examine model performance. J. Clin. Epidemiol. 79, 76–85 (2016).
Steyerberg, E. W., Nieboer, D., Debray, T. P. A. & van Houwelingen, H. C. Assessment of heterogeneity in an individual participant data meta-analysis of prediction models: an overview and illustration. Stat. Med 38, 4290–4309 (2019).
Debray, T. P. et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J. Clin. Epidemiol. 68, 279–289 (2015).
Cowley, L. E., Farewell, D. M., Maguire, S. & Kemp, A. M. Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature. Diagnostic Progn. Res. 3, 16 (2019).
Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369, m1328 (2020).
Gulati, G. et al. Generalizability of cardiovascular disease clinical prediction models: 158 independent external validations of 104 unique models. Circ. Cardiovasc. Qual. Outcomes 15, e008487 (2022).
Futoma, J., Simons, M., Panch, T., Doshi-Velez, F. & Celi, L. A. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health 2, e489–e492 (2020).
Burns, M. L. & Kheterpal, S. Machine learning comes of age: local impact versus national generalizability. Anesthesiology 132, 939–941 (2020).
de Hond, A. A. H. et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. npj Digital Med. 5, 2 (2022).
Sperrin, M., Riley, R. D., Collins, G. S. & Martin, G. P. Targeted validation: validating clinical prediction models in their intended population and setting. Diagnostic Progn. Res. 6, 24 (2022).
Van Calster, B., Steyerberg, E. W., Wynants, L. & van Smeden, M. There is no such thing as a validated prediction model. BMC Med. 21, 70 (2023).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Eur. Urol. 67, 1142–1151 (2015).