A new study proposes a system to determine the relative accuracy of predictive AI in virtual medical settings and when the system should defer to the judgment of human clinicians.
Artificial intelligence (AI) has great potential to improve the way people work across a variety of industries. But to integrate AI tools into the workplace in a safe and responsible manner, we need to develop more robust methods for understanding when AI tools are most useful.
So when is AI more accurate? And when are humans more accurate? This question is especially important in healthcare, where predictive AI is increasingly used in high-stakes tasks to assist clinicians.
Today, we published a joint paper with Google Research in Nature Medicine. This paper proposes CoDoC (complementarity-driven clinical deferral workflow), an AI system that learns whether to rely on predictive AI tools or defer to clinicians. The most accurate interpretation of medical images.
CoDoC is exploring ways to leverage human and AI collaboration in virtual medical settings to deliver the best outcomes. In one example scenario, CoDoC reduced the number of false positives by 25% without missing any true positives in a large anonymized UK mammography dataset compared to commonly used clinical workflows. % reduced.
The effort is a collaboration with several health organizations, including the United Nations Office for Project Services’ Partnership to Stop Tuberculosis. To help researchers further our efforts to improve the transparency and security of real-world AI models, we’ve open sourced CoDoC’s code on GitHub.
CoDoC: Add-on tools for human-AI collaboration
Building more reliable AI models often requires redesigning the complex inner workings of predictive AI models. But for many healthcare providers, redesigning predictive AI models is just not possible. CoDoC has the potential to improve predictive AI tools without requiring users to change the underlying AI tools themselves.
There were three criteria when developing CoDoC:
Even people who are not machine learning experts, such as healthcare providers, should be able to deploy the system and run it on a single computer. Training requires a relatively small amount of data, typically only a few hundred samples. The system is compatible with any data. You can use your own AI models without needing access to the inner workings of the model or the data used to train it.
Deciding whether predictive AI or clinicians are more accurate
At CoDoC, we propose a simple, easy-to-use AI system that improves reliability by helping predictive AI systems “know when you don’t know.” We considered a scenario where clinicians have access to AI tools designed to help them interpret images. For example, a chest x-ray may be used to determine if a tuberculosis test is needed.
In any theoretical clinical setting, CoDoC’s system would require only three inputs for each case in the training dataset.
Predictive AI outputs a confidence score between 0 (definitely not present) and 1 (definitely present). Interpretation of medical images by clinicians. For example, the ground truth of whether a disease is present or not is established as follows. Biopsy or other clinical follow-up.
Note: CoDoC does not require access to medical images.
CoDoC learns how to establish the relative accuracy of a predictive AI model compared to a clinician’s interpretation and how that relationship varies by the predictive AI’s confidence score.
Once trained, CoDoC can be incorporated into virtual future clinical workflows involving both AI and clinicians. When a new patient image is evaluated by the predictive AI model, its associated confidence score is entered into the system. The CoDoC will then evaluate whether to accept the AI decision or defer it to the clinician, ultimately resulting in the most accurate interpretation.
Improved accuracy and efficiency
Comprehensive testing of CoDoC using multiple real-world datasets, including only historical anonymized data, shows that combining the best of human expertise and predictive AI is better than the other. We found that it gave higher accuracy than when used alone.
CoDoC not only reduced false positives in mammography datasets by 25%, but also reduced the number of cases clinicians needed to read by 2 in what-if simulations where the AI was allowed to operate autonomously in certain situations. We were able to reduce the number of cases. One third. We also demonstrated how CoDoC could hypothetically improve chest X-ray triage for future TB testing.
Developing AI for healthcare responsibly
Although this research is theoretical, it shows that AI systems have the potential to adapt. CoDoC has been able to improve the performance of medical image interpretation across different demographic populations, clinical settings, medical imaging equipment used, and disease types.
CoDoC is a promising example of how the benefits of AI can be leveraged in combination with human strengths and expertise. We work with external partners to rigorously evaluate the potential benefits of our research and systems. To safely deploy technologies like CoDoC into real-world healthcare settings, healthcare providers and manufacturers must understand how clinicians interact differently with AI and develop specific healthcare AI tools and settings. You should also validate your system using:
For more information about CoDoC, please see below.