Reanalyses in the Age of Machine Learning: Why Dataset Curation Matters Now More than Ever Journal Article uri icon

Overview

abstract

  • Abstract; As machine learning becomes ever more prevalent within Earth and atmospheric science, clear and consistent descriptions of models, observations, and observation-based datasets, particularly reanalyses, are increasingly vital. Reanalyses remain foundational for climate and weather research, but advancements in data assimilation and model nudging methods, as well as increasingly complex physical parameterization options, mean that not all variables within reanalyses are equally constrained by observations. Because machine learning models are often trained and evaluated on such datasets, imprecise terminology and inadequate documentation can lead to a loss of information content, mislead users unfamiliar with data nuances, lead to the training of flawed machine learning models, and ultimately result in model evaluations that do not realistically describe performance relative to observations. This essay argues for more careful use of the term “reanalysis,” emphasizing that it should be reserved for datasets that explicitly blend observations with models through data assimilation. It highlights the rise of “reanalysis adjacent” datasets, as well as the growing disconnect between data producers and increasingly interdisciplinary users, particularly within the machine learning community. It offers guidance for dataset producers and users, alongside recommendations to enhance transparency, including renewed use of variable classification systems, better documentation of variable-specific uncertainties, and greater community-wide emphasis on data transparency. Without such efforts, Earth science datasets may be applied indiscriminately, regardless of fitness for purpose. Ensuring trustworthy and interpretable data are essential for maintaining the scientific integrity of Earth system modeling in the machine learning age.

publication date

  • April 1, 2026

Date in CU Experts

  • June 18, 2026 11:25 AM

Full Author List

  • Abel MR; Thompson AJ; Gutmann ED; Mahoney K; McCrary RR; Schumacher RS; Slivinski LC

author count

  • 7

Other Profiles

International Standard Serial Number (ISSN)

  • 0003-0007

Electronic International Standard Serial Number (EISSN)

  • 1520-0477

Additional Document Info

start page

  • E922

end page

  • E931

volume

  • 107

issue

  • 4