In today’s digital age, we expect a wealth of information to appear instantaneously to any search query — whether we’re comparison shopping online, conducting global research or searching for a colleague without knowing the correct spelling of their name. Computer Science Professor Chen Li of UC Irvine’s Donald Bren School of Information and Computer Sciences (ICS) has made significant contributions in this area through his research into powerful and efficient approximate query processing. Now, in recognition of these contributions, Li has been elevated as an IEEE Fellow, a distinction reserved for select IEEE members whose extraordinary accomplishments in any of the IEEE fields of interest are deemed fitting of this prestigious grade elevation.
“In many data-intensive applications, we need to find results that satisfy a query predicate not exactly, but with minor differences,” explains Li. Example use cases include record linkage (finding records representing some real-world entity but in slightly different formats) and fuzzy search (finding documents with keywords similar to query keywords). “A key challenge in these domains is achieving high efficiency, even with a large data set,” he says. “In applications such as search, a request needs to be answered within milliseconds to enable a responsive user experience.”
To address this challenge, Li, working with ICS students and collaborators, developed a large body of techniques to answer such queries efficiently. A 2008 paper for the International Conference on Data Engineering (ICDE) co-authored by Li covered approximate string search and was widely referenced in the community. The Flamingo open source package on approximate string matching he helped develop is also widely used by many researchers in the field. A 2010 paper for the ACM Special Interest Group on Management of Data (SIGMOD), covering parallel set-similarity joins using MapReduce, started many follow-up studies, and Li led the effort to integrate these techniques into the Apache AsterixDB parallel database system. This was the first database to support powerful similarity query processing using both Levenshtein and Jaccard functions.
“Professor Li has made several seminal contributions to enable databases to be more robust and user-friendly by addressing challenges that arise when queries and data may contain errors,” wrote ICS Computer Science Professor Sharad Mehrotra in the IEEE Fellow nomination letter. “His work has explored novel data structures, query-processing mechanisms and optimization methods to support fuzzy matches efficiently in databases.”
Mehrotra’s nomination also noted Li’s more than 40 published papers on the topic of query processing over databases when queries or data may be erroneous. “Many of his papers have opened new directions of research with significant follow up as evidenced by high levels of citations,” he said. “Professor Li’s work has also had an impact through technology transition — several of his software artifacts have been extensively deployed and used.” For example, Li’s team developed the iPubmed system to support instant search on more than 21 million MEDLINE publication records used by scientists.
More recently, Li led the Cloudberry project to develop middleware-based solutions to support in-situ data analytics and visualization. “The impact of Professor Li’s work is even more pronounced through his contributions to the open source community and software code release,” wrote Mehrotra. “Professor Li [is] the primary contributor to similarity queries in AsterixDB, which is now an Apache incubated project.” AsterixDB has been in use for instructional and research purposes at a variety of institutions around the world. “As the first open source parallel database, the Apache AsterixDB project made significant contributions to the community of data-intensive computing,” wrote Mehrotra. “Professor Li’s work on massive scale visualization led to a web portal called TwitterMap that supports interactive visualization and analytics on more than two billion tweets collected over seven years. … The system was deployed during the outbreak of COVID-19 to display social media reactions to the virus and received wide publicity.”
Now, Li is collaborating on an early warning system for future pandemics. By the end of December 2022, Li says, the group hopes to have a pilot system ready, a broader collection of data to analyze and additional results.
— Shani Murray