The Best Student Paper at the 2019 IEEE International Conference on Big Data (BigData 2019) was “AFrame: Extending DataFrames for Large-Scale Modern Data Analysis” by computer science Ph.D. student Phanwadee Sinthong and Computer Science Professor Michael J. Carey.
“I was truly honored and excited,” says Sinthong, whose research focus is in the area of databases. “I have been working on incorporating database management capabilities with existing data science technologies to help support and enhance big data analysis.”
In the conference paper, Sinthong and Carey introduce AFrame, a new scalable data analysis package. “We noticed technical challenges and difficulties that data scientists face when moving from small to big data analysis,” explains Sinthong. In an effort to make data management capabilities for large-scale modern data available to the data science community, they have integrated a development paradigm familiar to data scientists (DataFrame) with a big data management system, Apache AsterixDB.
“AFrame can be used in various data analysis projects that need to interact with and manipulate local or distributed data,” says Sinthong. “We have already had students conduct end-to-end projects using AFrame in place of Python’s popular Pandas DataFrame package for data that cannot fit in memory and achieve satisfactory results.”
The work was supported by a yearlong exploration grant from the Donald Bren School of Information and Computer Sciences (ICS). Sinthong and Carey welcome feedback and would like to have more students trying it out. For more information, contact Professor Carey at firstname.lastname@example.org.
— Shani Murray