Skip to main content

Whether examining tweets to better understand crime levels, or monitoring comments posted on public forums to identify health trends, researchers need methods and tools specifically designed for social media analytics. That is why Computer Science Professor Chen Li decided to organize UCI’s first SoCal Social Analytics Workshop on May 11, 2018. Sponsored by the UCI Data Science Initiative and the UC Institute for Prediction Technology (UCIPT), the goal was to bring together people from different disciplines to exchange ideas regarding social media as a data source.

Different Disciplines, Same Challenges

Professor Li often confronts similar challenges in his exploration of social media data analysis, regardless of whether he’s collaborating with researchers from computer science, informatics, social science or public health, or from UCI, UCLA, UC Riverside or the San Diego Supercomputer Center. “There is a lot of commonality and common interests, and the challenges are very similar,” says Li.

Given this commonality, about six months ago, he started floating the idea of hosting a workshop on the topic, and all of his colleagues were supportive — particularly, Padhraic Smyth, director of the Data Science Initiative, and a group of colleagues from UCLA. So, Li moved forward, recruiting Professor John R. Hipp from UCI’s Department of Criminology, Law and Society — the researcher using tweets to analyze crimes — to help him organize the event.

Leveraging Social Media

According to Li, researchers often use social media because of its unique characteristics: it’s real time, readily available, cheap and large scale. “You can get the latest information and lots of it,” explains Li. There are privacy concerns, but that’s also true of other data sources, and social media gives researchers access to “millions or billions of records that give you a lot of insights.”

Although it’s challenging to deal with such data, which Li notes is “very big, very dynamic and has different attributes in terms of time, space and text,” he says that the general process is similar across disciplines. “The common theme for these projects is to do data storage indexing and then machine learning,” he says. This requires using labeling techniques and supervised learning to create a machine learning model that can be applied to an existing pipeline. “So even if the data is from different disciplines,” says Li, “the whole pipeline is very similar.”

Fostering Collaboration

During the first part of the all-day workshop, researchers from non-IT domains talked about their research, focusing on how they use social media data in their analyses. Topics ranged from protest dynamics, censorship and public health to real-world predictions, war experiences and neighborhood crime.

After hearing from the domain experts, the second half of the workshop featured talks and demos from computer scientists about the kinds of tools in development, including

  • Apache AsterixDB for open source big data management,
  • Cloudberry for big data visualization,
  • Kite for microblogs data management,
  • Social Post Analyzer for collaboratively labeling social posts and
  • Texera for text analytics.
Professor Li’s demonstration of several systems currently being developed by his team and colleagues to support social media analysis as big data services.

At the end of the day, there was a panel discussion for questions and feedback from the audience. Video clips of the entire workshop are available online.

Li was pleased that much of the audience was not from computer science. “We wanted people from non-CS domains to attend to get some new ideas and to see how they might benefit from our work.”

Learning about real-world studies helps Li and his colleagues further their own research into big data, machine learning, text analysis and visualization. “We want to use social media as a very good use case to put all the pieces together to finish the whole pipeline to help people solve problems.” The goal is to develop tools to make social media analysis more accessible and efficient by supporting big data analysis as a service. Li also wants to make the tools general purpose, so the underlying software and services are applicable to other domains — not just social media.

Li says that the workshop helped people make new connections, and he hopes innovative ideas can grow organically from this. “We want to be the platform guys,” says Li, referring to computer scientists. “We want to help others focus on their work by making it so they don’t need to worry about what’s happening under the hood. They can just enjoy the framework or infrastructure we build.”

Shani Murray