Skip to main content

Although Stephen McAleer first became interested in artificial intelligence after reading the book Gödel, Escher, Bach: An Eternal Golden Braid, it wasn’t until Google’s AI program AlphaGo beat the world champion of the game Go that he decided to switch careers from finance to AI research. Now, as a Ph.D. student working with Chancellor’s Professor Pierre Baldi in the Donald Bren School of Information and Computer Sciences (ICS), he has helped tackle a new deep-learning challenge.

Along with Ph.D. student Forest Agostinelli and computer science major Alex Shmakov, McAleer and Baldi have developed Autodidactic Iteration, a novel reinforcement learning algorithm that can teach itself how to solve the Rubik’s Cube with no human assistance.

Why the Rubik’s Cube?
The Rubik’s cube is all about mathematics — in particular, group theory. “We are studying the Rubik’s Cube,” explains Baldi, “because we are very interested in how machines can learn symbolic languages and manipulations, and mathematics is the highest form of symbolic processing.”

Shmakov first came up with the idea of trying to solve the Rubik’s Cube without human data. Unlike games like chess or Go, which reward players for each “winning” step, the Rubik’s Cube doesn’t lend itself to such step-by-step rewards. “At first, I thought it was impossible,” admits McAleer, “since there are so many states and only one reward state.” However, after talking with one of the developers of AlphaGo, he realized that they could try an approach similar to that used by AlphaGo, combining neural networks with search.

Their Autodidactic Iteration solution uses curriculum learning to let the algorithm “teach itself” which cubes are closer to being solved by developing a curriculum of cubes, starting from the solved cube. Once the network has been trained, Monte Carlo Tree Search is used to solve the puzzle. “We are very excited about this because it uses pure reinforcement learning to solve a combinatorial optimization problem,” says McAleer. “By getting rid of the need for human data, we hope that pure reinforcement learning approaches will be able to solve domains that are too complex for humans to think about.”

The training and solving process is split up into Autodidactic Iteration and Monte Carlo Tree Search.

The Excitement Goes Viral
McAleer and his colleagues aren’t the only ones excited by their findings. Interest in the work has been widespread, with articles appearing in everything from MIT Technology Review and Popular Mechanics to CNET and Gizmodo. Researchers hope that eventually, just by specifying a reward function, a general-purpose reinforcement learning agent will be able to solve the task.

“Combinatorial optimization problems pop up all the time in science and in industry,” explains McAleer. “In science, molecular design, the prediction of protein tertiary structure, and drug discovery are all combinatorial optimization problems. In industry,” he continues, “developing the best airline network of spokes and destinations, deciding how to route taxis, and optimally delivering packages are examples of combinatorial optimization.”

The team is thus extending their research to larger Rubik’s Cubes as well as to other scientific applications. In particular, McAleer is working on predicting the tertiary structure of proteins. “I am also researching ways to combine unsupervised learning with deep reinforcement learning to allow agents to create mental models of new environments and transfer those models to other tasks.” He says this will be useful in areas such as robotics, where agents don’t have a model of the environment like in Go or with the Rubik’s Cube.

Larger Implications for Society
While today’s work focuses on solving games and puzzles, McAleer recognizes that tomorrow’s applications might not be as playful. He has concerns about autonomous weapons, autonomous cyber-attacks and job loss due to automation. “Internationally, we need to develop norms and rules for how these new technologies can be used,” he says. “Domestically, we need to focus on educating children for lifelong learning, removing barriers to higher education and providing training for people who suddenly lose jobs to automation.”

Even as his work continues to raise the stakes, McAleer is excited about AI’s potential to accelerate progress. “I am extremely optimistic about applications of AI in science and technology,” noting that advancements in these areas are, after all, “how society moves forward.”

Shani Murray