Alex Lu (Opens in a new tab)Stan Hua (Opens in a new tab)Lauren Erdman (Opens in a new tab)
Artificial Intelligence (AI) is transforming healthcare using applications to detect cancer in medical images from interpretation of electronic medical records. We ask: Are the kids left behind? A 2023 survey identified that only 22 of the 692 medical AI devices were evaluated transparently in children and approved by the FDA for pediatric use. This may suggest that children are excluded from the benefits that health AI must provide. To bring awareness to this issue, the American College of Radiology recently formed a Pediatric AI Working Group to advocate for equal access to safe medical AI for children.
The problem is clear, but it remains unclear why pediatric AI is so underdeveloped. A recent print shows that “a child shortage in public medical imaging data refers to an increase in age bias in biomedical AI (Opens in a new tab)”, hypothesized that this is driven in part by child underestimation of public datasets. Modern AI relies on a large amount of data for AI development. If children’s public biomedical data is limited and they are trying to build and evaluate models focused on pediatric populations, they will face increased barriers or are initially not possible.
We attempt to answer this by conducting the largest review of public health imaging datasets to date. From medical machine learning papers, we identified various ways in which the authors of these papers identified datasets to develop machine learning models. Using the same strategy, a total of 181 public health imaging data sets were collected and analyzed for patient age reporting and distribution.
Our main finding is that, despite children making up 25% of the world’s population, less than 1% of public medical imaging data are from children. Many datasets report no age at all, suggesting that age is not considered important patient metadata by dataset creators. Even among the datasets reporting age (116 of 181), there are few attempts to balance the dataset.
Link this lack of data to some results.
First, the lack of pediatric data hinders machine learning research. Only one of the 46 studies in 2023 and 2024 used pediatric data at the Medical Imaging Conference (MIDL). Importantly, the pediatric data gap is heterogeneous across medical imaging modalities and applications. Some AI applications do not have practically any pediatric samples to build or evaluate models. For example, a review of the dataset identified almost 19,000 MRI images that could be used to construct a model that could diagnose diseases such as cancer.

Second, in the absence of a dedicated pediatric AI model, clinicians can unconsciously rely on the use of “off-label” of adult AI models in children. Children have historically been overlooked in the development of medical medications and devices compared to adults, and therefore off-label use is generally common in pediatric clinical practice. Our research reinforces that off-label use can be dangerous when it comes to medical AI. Train AI models to predict cardiac hypertrophy, a condition characterized by an unusually large heart from chest x-ray images. It shows increasingly failing in young healthy patients, with error rates reaching 50% in the youngest children in our assessment (ages 0-1).
Third, our analysis suggests that if we do not pay attention to the issue that medical AI models may be biased towards children, this issue will only grow in future medical AI development. Recently, researchers have been training foundation models, generalist models that can handle a wide range of tasks. Training these models requires much larger datasets than previous models specialized for a particular task, so researchers build datasets for foundation models, often aggregating the datasets from multiple sources. The review identified 16 public datasets in which data arrives from other public datasets. Among the 16 datasets, eight contained secondary data that first reported the age of patients, but in all cases, age was found to be deleted at aggregation.
Together, our research reveals a significant gap in the development of medical AI. The lack of public pediatric data puts children at risk of being left behind. It features initiatives to collect, prepare and release AI-Reaided data to the public to encourage a wide range of AI to take action and to support the development of pediatric AI applications.