Specifying Goals to Deep Neural Networks with Answer Set Programming
Forest Agostinelli
Assistant Professor, University of South Carolina
Abstract: Methods such as DeepCubeA have used deep reinforcement learning to learn domain-specific heuristic functions in a largely domain-independent fashion to solve planning problems. However, such methods either assume a predetermined goal or assume that goals will be given as fully-specified states. Therefore, specifying goals to these learned heuristic functions with only high-level knowledge of what properties a goal state should or should not have is either impractical or impossible. In this talk, I will introduce our approach for training heuristic functions that estimate the cost-to-go to a set of goal states represented as a partial assignment. I will then discuss how we build on this goal representation with answer set programming to allow for expressive, high-level, goal specification. Finally, I will show how we take inspiration from conflict-driven SAT solving and exploit properties of non-monotonic reasoning to efficiently find paths to goals specified using negation as failure. In our experiments with the Rubik’s cube, sliding tile puzzles, and Sokoban, we show that we can specify and reach goals without any need to re-train the heuristic function. Our code is publicly available at https://github.com/forestagostinelli/SpecGoal.
Bio: Forest Agostinelli is an assistant professor at the University of South Carolina. His research aims to use artificial intelligence to automate the discovery of new knowledge. He looks to apply his research to fields such as puzzle solving, chemical synthesis, robotics, quantum computing, theorem proving, program synthesis, and education. He led the creation of DeepCubeA, an artificial intelligence algorithm capable of solving puzzles such as the Rubik’s cube without human guidance. DeepCubeA has since been applied to problems in quantum computing, chemical reactions, cryptography, and parking lot optimization. He earned his Ph.D. from the University of California, Irvine under the supervision of Professor Pierre Baldi.