Arkajyoti (Arka) Saha
“I combine the power of statistical modeling with the scalability and flexibility of AI/ML, bringing together the best of both worlds.”
Integrating Statistics with AI
Professor Arka Saha’s research integrates the theory and practice of artificial intelligence (AI) and machine learning (ML) with statistics. “AI/ML methods often forego heavy model assumptions of classical statistics in favor of a model-free data-driven approach,” he says. “Though this lends scalability and flexibility to the AI/ML methods, they frequently neglect the data’s inherent structure.” Classical statistics has long been used to model these domain-specific structures, with scientists’ domain expertise as a foundation. “I combine the power of statistical modeling with the scalability and flexibility of AI/ML, bringing together the best of both worlds.”
Understanding Data Dependence
AI/ML models often ignore the data’s dependent structure, assuming independence implicitly or explicitly. However, this dependence is frequently of crucial importance; failing to address it may result in poor efficacy, and the structure itself may be of scientific significance. Professor Saha turns such challenges into assets by explicitly modeling the dependence using statistical tools. “Using the knowledge that the observations or features are dependent, I ‘borrow strength’ across the rows or columns of the data, increasing the power and accuracy of the AI/ML approaches while also providing insight into the dependence structure itself.” He applies this concept to integrate spatial and temporal models into a machine learning framework, which is a key element of his methodological research.
Collaborating for Real-World Impact
Professor Saha focuses on collaborating with scientists to solve open scientific challenges by merging AI/ML approaches with domain expertise via statistics. “My research paradigm on data dependence is of fundamental interest in environmental science, biomedical sciences, oceanography, finance, data privacy, and algorithmic fairness,” he says. “I also work with earth system scientists to evaluate the level of carbon in oceans. This allows us to better understand, forecast, and address a critical component of global environmental change, which can aid in developing policies for a more sustainable future.”
Education
Ph.D., Biostatistics, John Hopkins University, 2021
Master of Statistics, Indian Statistical Institute, 2016
Bachelor of Statistics, Indian Statistical Institute, 2014
Research Areas
AI, ML and Natural Language Processing
Producing machines to automate tasks requiring intelligent behavior...
Biomedical Informatics and Computational Biology
Techniques from applied mathematics, informatics, statistics and computer science to solve biological problems...
Statistics and Statistical Theory
Developing and studying methods for collecting, analyzing, interpreting and presenting empirical data...
Biostatistics
The application of statistical methods to analyze and interpret data in the fields of biology …
Genomics
An interdisciplinary field focusing on the structure, function, evolution, mapping and editing of genomes
Sustainability and Computing
Developing innovative ways to use and develop computational technologies to address environmental and societal challenges …