Skip to main content
Hengrui Cai headshot

A visit to the hospital generates extensive medical records, comprehensively filled with the patient’s health history as well as individualized information about their symptoms, diagnosis, treatment, medications, and health status throughout their stay. While rich in insights, these records are complex and time-consuming to process. “We hope to use these electronic health records to significantly improve health outcomes for wide populations,” says Hengrui Cai, a statistics professor in UC Irvine’s Donald Bren School of Information and Computer Sciences (ICS).

Professor Cai was recently awarded a grant from the National Science Foundation to design cutting-edge algorithms that automate the analysis of such large-scale and heterogeneous text data. In particular, her project, titled “Causal Discovery and Individualized Policy Optimization for Human Text Data,” aims to develop novel methodologies with solid theoretical properties and efficient algorithms for impactful applications in precision medicine and personalized recommendations.

Revolutionizing Causal Inference and NLP

The surge in text data due to recent advances in natural language processing (NLP) and large language models (LLMs) has opened new avenues in precision medicine, economics, recommendation systems and social science. Despite this growth, disentangling the complex relationships within the massive unstructured data with heterogeneity remains a significant challenge. This is where causal inference—a method that identifies cause-and-effect relationships—comes into play. By deciphering the black box inside powerful deep learning approaches through the causal lens, Cai focuses on developing a new personalized medical diagram via trustworthy machine learning.

“One of our goals is to extract the most valuable information in terms of causations from raw texts,” explains Cai. “Understanding the root causes of a patient’s health condition, we can better identify and develop more effective treatment strategies.”

Next-Generation AI for Precision Medicine

“What we really care about is building the next generation of artificial intelligence for personalized medicine,” says Cai, who is working in collaboration with ICS colleague Annie Qu and Assistant Professor Jiayi Wang from the University of Texas at Dallas. The team is focused on developing statistical theories, methods, and algorithms to unravel and understand the causal mechanisms within text data. By pioneering methods to identify causal relationships, they aim to establish a comprehensive framework for analyzing large-scale and heterogeneous text data.

Using the causal analysis framework, together with explainable deep learning approaches to identify causal relationships in the patient’s medical records, Cai’s team is focusing on efficiently processing those insights and more accurately offering optimal treatment options. “We want to automatically extract useful information to make the best possible decisions,” says Cai, “thereby ultimately maximizing patient health outcomes.”

Shani Murray

Skip to content