Skip to main content

“Data scientist” and “statistician” are among the top 10 jobs in 2022, according to U.S. News & World Report. The growing popularity of data-focused careers has created a demand for students who understand the immense challenges and real-world impact of complex data analysis. This demand is the backdrop for the SoCal Data Science program, a collaborative effort between faculty from UCI, California State University Fullerton (CSUF) and Cypress College aimed at recruiting, training and dispatching a diverse workforce of STEM and data science majors.

Led by Statistics Professors Babak Shahbaba and Mine Dogucu of UCI’s Donald Bren School of Information and Computer Sciences (ICS) and Mathematics Professor Roberto Pelayo, the program is funded through a National Science Foundation grant, “Preparing a Diverse Workforce via Academic and Industrial Partnership.” With $1.5 million in funding, across the three institutes and over the next three years, the goal is to recruit 120 fellows from the three campuses, offering targeted coursework and hands-on training, research opportunities, and career development workshops.

The program is off to a strong start, with 32 fellows selected from a pool of more than 200 applicants. This diverse cohort is 87% women and students from other historically underrepresented backgrounds. “Having such a diverse and strong group of students is amazing,” says Shahbaba. “That was one of the goals of the program.”

Students and faculty of the SoCal Data Science program.

During the winter and spring, this first cohort of students took data science courses developed through the SoCal Data Science program and offered at UCI, CSUF or Cypress College. Then, the students put their new skills to the test with a summer research project.

Rigorous Training at Summer Bootcamp
Prior to starting the six-week research project, students attended a one-week bootcamp at UCI. The topics covered included exploratory data analysis, advanced R programing, and generalized linear models.

“We had students coming from different backgrounds and fields — computer science, statistics, biological sciences — and we wanted to make sure that before they started doing research, they had the required skills,” says Shahbaba. The bootcamp brought everyone together in terms of research preparedness. “The students really helped each other and worked well together.”

The bootcamp also included a session on ethics in data science, as well as training in professional and academic development. “We focused on more than just data science, covering resume writing and interview skills,” says Shahbaba, “and we invited graduate students to talk about their experiences in grad school.”

Real-World Research Experience
Once the bootcamp ended, students broke into several groups, with each group diving into a real-world data analysis project. “It was six weeks of intense research,” says Shahbaba. They were on campus at least three days a week from 10 a.m. to 4 p.m., working in the new Interdisciplinary Science and Engineering Building (ISEB) at UCI, designed for collaborative research. “It’s such a well-designed space for these kind of projects; it made our collaboration with the industrial and academic partners very smooth and productive.”

Students collaborating in an ISEB classroom.

Shahbaba lined up a variety of academic and non-academic partners for the summer projects: Children’s Hospital of Orange County (CHOC), the Center of Hydro-meteorology and Remote Sensing (CHRS), the Fleischman Lab, the Fortin Lab and the Reich Lab.

“The program was an amazing experience for me because it really gave me a chance to apply skills I learned from my courses to real-world problems,” says program fellow Giles Pierre Carlos, a data science major at UCI. “I was able to apply the techniques I’ve learned [and] build significant models for an extremely interesting area of neuroscience research.”

At the Undergraduate Research Symposium on Aug. 4, 2022, the last day of the summer program, Carlos and his teammates presented their work on understanding the underlying neural mechanisms of memory. Their project, “Modeling Nonspatial Sequence Memory Task Using Neural Decoding,” leveraged data provided by Dr. Norbert Fortin and the Fortin Lab at UCI.

“The program was collaborative in nature and, because of that, I learned how to solve statistical problems in a team setting,” says Carlos. “On top of that, the program solidified my interest in obtaining a master’s in statistics and even made me open to going for a Ph.D. in the future. Overall, the program made a huge impact on where I want to take my career.”

Alyssandrei Parinas, who is studying computer science at Cypress College and worked with Carlos on the neuroscience project, agreed. “The entire experience was challenging yet very rewarding,” she says. “The program taught me a lot of things and allowed me to better understand what data science is.”

Ayah Halabi, a mathematics major at CSUF, conducted research on how diet can affect symptoms related to myeloproliferative neoplasms (MPN), a cancer located within the blood and bone marrow. Her team’s project, “Impact of Mediterranean and DASH Diets on MPN Symptoms,” leveraged data from Dr. Angela Fleischman of UCI’s School of Medicine and the Fleischman Lab.

“This program has taught me so much about the data science world, and how important it is to make our lives better,” wrote Halabi in a LinkedIn post. “I fell in love with the field [and] with all the challenges and complexities it holds. [I’m] so grateful to have learned such important skills and worked [with] amazing faculty.”

There were also projects related to COVID-19 and pediatric cystic fibrosis that leveraged data from CHOC, and a project on smartphones and infant language development used data from Stephanie Reich of UCI’s School of Education. The final team analyzed rainfall over time using data provided by Phu Nguyen and CHRS.

Students James Owens (left) and Ayah Halabi presenting their work at the Research Symposium.

Building a Diverse Data Science Workforce
The SoCal Data Science team plans to build on the success of this first cohort, recruiting more highly motivated and deserving students. They also hope to strengthen ties with similar programs elsewhere in the U.S. to develop a strong community of data science learners who can meet the complex demands of increasingly data-focused careers.

To learn more about the program, visit

— Shani Murray