Credit: Unsplash/CC0 Public Domain
According to a study led by a computer science graduate at Northeastern University, the artificial intelligence models used to detect depression on social media are often biased and methodologically flawed.
Yuchen Cao and Xiaorui Shen were graduate students at the Seattle campus in Northeastern, when they began to look at how machine learning and deep learning models are being used in mental health research, particularly following the Covid-19 pandemic.
Working with several university peers, they conducted a systematic review of academic papers using AI to detect depression in social media users. Their findings were published in the Journal of Behavioral Data Science.
“We wanted to see how machine learning, AI, or deep learning models are used in research in this field,” says CAO, a software engineer at Meta.
Social media platforms such as Twitter, Facebook and Reddit provide researchers with a chunk of user-generated content that uncovers patterns of emotion, thoughts and mental health. These insights are increasingly being used to train AI tools to detect signs of depression. However, a Northeast-led review found that many of the underlying models are poorly tuned and lack the rigour required for real applications.
The team analyzed hundreds of papers and selected 47 related studies published from databases such as PubMed, IEEE Xplore, and Google Scholar since 2010. They found that many of these studies were written by medical and psychology experts rather than computer science.
“Our goal was to investigate whether current machine learning models were reliable,” says Shen, now a software engineer at Meta. “We found that some of the models used were not properly tuned.”
Traditional models such as support vector machines, decision trees, random forests, extreme gradient boosts, and logistic regression were commonly used. Some studies have adopted deep learning tools such as convolutional neural networks, long-term long-term memory networks, and popular language model Bert.
However, this review revealed some important issues:
Only 28% of studies properly adjusted the appropriately adjusted hyperparameters. This is a setting that guides how the model trains from the data. Approximately 17% split the data into appropriate training, validation and test sets, increasing the risk of overfitting. Many relied heavily on accuracy as the only performance metric despite the disproportionate dataset that could distort the outcome and overlook minority classes, and in this case, the disproportionate dataset that could indicate depression.
“There are some constants or basic standards. All computer scientists know, “Before you do A, you need to do B.” “But that’s not something anyone outside of this field knows and can lead to bad outcomes and inaccuracies.”
The study also displayed significant data bias. X (formerly Twitter) was the most common platform used (32 studies), followed by Reddit (8) and Facebook (7). Studies combining data from multiple platforms relied primarily on English submissions from US and European users.
The authors argue that these restrictions reduce the generalizability of the findings and do not reflect the global diversity of social media users.
Another major challenge: linguistic nuance. Only 23% of studies clearly explained how they handled denial and irony. Both are essential for emotional analysis and depression detection.
To assess reporting transparency, the team used Probast, a tool for evaluating predictive models. They found that many studies lacked important details on dataset partitioning and setting hyperparameters, making it difficult to replicate or validate the results.
CAO and Shen will publish follow-up papers using real data to test the model and recommend improvements.
Researchers may not have enough resources or AI expertise to properly tune open source models, says Cao.
“So (create) wiki or paper tutorials are something I think is important in this area to help collaborate,” he says. “Because resources are always limited, I think teaching people how to do it is more important than simply helping them.”
The team will present their findings at the Annual Meeting of the International Association for Data Science and Analysis in Washington, DC.
More info: Yuchen Cao et al, Machine Learning Approach for Depression Detection on Social Media: A Systematic Review of Bias and Methodological Challenges, Journal of Behavioral Data Science (2025). doi:10.35566/jbds/caoyc
Provided by Northeastern University
This story has been republished courtesy of Northeastern Global News news.northeastern.edu.
Citation: Important biases in AI models used to detect depression on social media (July 3, 2025) Retrieved from July 6, 2025 from https://techxplore.com/news/2025-07-Key-biases-ai-depression-social.html
This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.