Skip to main content

High School Students Perform Real-World Analytics During Data Science for All Program

Did people enjoy the movie “Spider-Man: No Way Home”? How did fans react to delays in releasing the sequel to “Hollow Knight”? How did red states versus blue states view the 2020 election? Did the release of “Valorant” increase its popularity? And what about the popularity of high school football?

These are the questions 10 high school students, in teams of two, chose to explore during a new Data Science for All summer program held at UC Irvine from July 10–21, 2023. The students analyzed related tweets using Texera, an open source platform for collaborative data analytics. The platform is being developed by Professor Chen Li of the Donald Bren School of Information and Computer Sciences (ICS), who led the summer program in collaboration with Professor Wei Wang of UCLA.

Thanks to funding through the National Science Foundation’s Broadening Participation in Computing (BPC) program, the two-week summer camp (which ran from 9 a.m. – 4 p.m. on weekdays and included lunch) was free to attend. Instructors included both UCI and UCLA professors and Ph.D. students with expertise in data management, data science and machine learning. The goal was to leverage the Texera platform in teaching students — particularly those with a limited background in computing — data science and machine learning techniques, including basic concepts about data wrangling, ML training, data classification, sentiment analysis and visualization.

One of the high school students, Lesley Gomez, says that participating in the program has inspired her to delve deeper into data science. “The interactive lectures and completing labs through the Texera platform helped me gain a deeper understanding of data science. Additionally, this program gave me the opportunity to enhance several essential skills, including public speaking, as we got to deliver presentations to high school students, the creators of Texera, and parents,” she says. “Overall, the DS4ALL program was a great experience for me.”

High school Data Science 4 All participants.
High school students (in UCI ICS shirts) with their instructors for the Data Science 4 All program.

Teaching with Texera
The first week, Li and Xiaozhen Liu, a computer science Ph.D. student at UCI, taught the students about big data preparation, discussing topics such data modeling, databases, data cleaning, data wrangling and visualization. The second week, UCLA instructors gave lectures on machine learning, AI and natural language processing. Throughout the program, students attended lab sessions and gained hands-on experience using Texera.

For the capstone project, students selected their own topic for sentiment analysis. “We gave them the raw tweets, and they spent the first week cleaning the data,” says Liu. “Then they did the analysis during the second week, using machine learning models on these tweets to gain some insights.”

High school students Zeina Harden and Vitoria Mendez building a Texera workflow together to analyze tweets about high school football (left) and discussing their workflow during their final presentation (right).

“It’s amazing that these students, despite their various backgrounds, within one day were able to become familiar with Texera and then use the system to do data wrangling and data preparation steps,” says Li. “We were pretty happy to see the students learn how to use the system so quickly.”

Aside from its user-friendly GUI, another benefit of Texera is that you don’t have to install any compute-intensive software. “Everything happens on the server side, so you can even use an old laptop that is cheap and not very powerful,” says Li. During the program, UCI provided students with Mac laptops. “And they didn’t need to always have the same laptop. They just needed to go to the website and log in. That’s the beauty of the cloud service.”

Chris Rodas and Lesley Gomez give their presentation on their analysis of “Spider-Man: No Way Home” tweets, talking about how they used logistic regression training.

What’s Next
The program was a win-win for students and faculty alike. “A benefit to us [is] the students were using our servers, so they gave us a lot of good feedback about [Texera’s] usability, scalability and efficiency,” says Li, “[which] is also beneficial to our research.”

The NSF funding for the project is for two years, so Li plans to host the program again next summer, hopefully doubling the number of attendees to 20. He is also exploring other ways to leverage Texera to introduce students to data science.

“We need to think about how to repeat this program in a more systematic way,” he says. “Summer is good, because most students have free time, but how else might we broaden participation?” The team is considering other ways to leverage Texera in introducing data science to high school students or to undergraduate students who aren’t majoring in a STEM field.

“We plan to organize all our materials for not only the same program next year but also possibly for some other programs,” says Li. “Our platform is especially useful for people who have a limited IT background. That’s the sweet spot for us, because of Texera’s cloud model and user-friendly interface. So we’re pretty happy with how the program turned out.”

Shani Murray