SynGatorTron™ to speed medical research, alleviate privacy worries
from UF Health
“Dr. Chatbot will see you now.”
The next generation of super-smart computers, tablets and cell phones may come equipped with artificial intelligence-generated medical chatbots that can interact with patients using human language and medical knowledge.
According to Yonghui Wu, Ph.D., director of natural language processing at the University of Florida Clinical and Translational Science Institute, the medical chatbot you interact with online will be able to use conversational language to communicate with and educate patients in much the same way we now interact with Apple’s chatbot, Siri, and Amazon’s Alexa.
The chatbot may also be culturally sensitive and matched to your age.
“It will be like having your own personal medical avatar,” Wu said.
Medical chatbots are just one of many possible applications to arise out of groundbreaking new AI tools developed by Wu and other researchers at UF and NVIDIA as part of a $100 million artificial intelligence public-private collaboration formed in 2020. Last year, they launched a clinical language AI model, GatorTron™. This AI tool enables computers to quickly access, read and interpret medical language in clinical notes and other unstructured narratives stored in real-world electronic health records. The model was trained on HiPerGator-AI, the university’s NVIDIA DGX SuperPOD system, which ranks among the world’s top 30 supercomputers.
The GatorTron™ model is expected to accelerate research and medical decision-making by extracting information and insights from massive amounts of clinical data with unprecedented speed and clarity. It will also lead to innovative AI tools and advanced, data-driven health research methods that were unimaginable even 10 or 15 years ago.
This year, the team is rolling out another model – SynGatorTron™ — with different capabilities. SynGatorTron™ can generate synthetic patient data untraceable to real patients. This synthetic data can then be used to train the next generation of medical AI systems to understand conversational language and medical terminology.
Most data-driven health research and health-related AI applications today rely on ‘de-identified’ patient data in electronic health records, from which patients’ private information such as name, address and birthdate, has been removed before it is used for research and development.
Removing patient data is time-consuming and labor-intensive. Automated de-identification systems can be used to generate large-scale machine de-identified data, but it’s not an ironclad solution.
According to Wu, even after all identifying patient information has been removed, there’s still a remote chance that someone could identify a patient by tracking data over time.
“Generating synthetic patient data is a safe way to preserve the knowledge of medical language but mitigate the risks of patient privacy,” Wu said.
Patient privacy isn’t the only barrier to training the next generation of AI models for research and other applications. The sheer volume of data required to train AI models can also stand in the way.
“There’s a finite amount of patient data available to us, and training AI computer models requires a tremendous amount of data,” said Duane Mitchell, M.D., Ph.D., director of the UF Clinical and Translational Science Institute and associate dean for clinical and translational sciences at the UF College of Medicine. “With SynGatorTron™, we can generate all the data we need.”