Publications & Technical Reports | |
R280 | |
Advancing AND/OR Abstraction Sampling and AND/OR Search-Based Computational Protein Design
Bobak Pezeshki
|
Abstract
Graphical models are a powerful framework for efficiently representing and reasoning about complex systems. They can be used to answer probabilistic queries, facilitate planning, and enable automated design across various fields including in business, medicine, and physics. Reasoning within graphical models typically involves optimization tasks, like finding the most probable configuration, or summation tasks, like computing beliefs over variables, or a combination of both, such as Marginal MAP where we maximize over a subset of variables while summing over the rest. These tasks are computationally challenging, requiring the use of approximation algorithms typically implemented through search, sampling, or variational inference techniques.
This dissertation presents advances in graphical model schemes in two main directions:
Several advancements to a recently developed Monte Carlo sampling method called Abstraction Sampling are presented. Abstraction Sampling is an unbiased stratified importance sampling-like scheme that leverages abstractions (similar to stratification) to solve summation queries such as determining beliefs about random variables or calculating the probability of evidence. This dissertation presents AOAS, an Abstraction Sampling algorithm tailored for compact AND/OR search spaces, explores a diverse range of abstraction functions, provides a theoretical analysis of the properties of Abstraction Sampling, and conducts an extensive empirical evaluation demonstrating Abstraction Sampling's superior performance compared to well-known methods including Importance Sampling, Weighted Mini-Bucket Importance Sampling, IJGP-SampleSearch, and Dynamic Importance Sampling.
Also presented is more applied research adapting graphical model frameworks for Computational Protein Design, specifically focusing on the automated redesign of proteins. This redesign is formulated as an optimization problem to maximize K*, an approximation of binding affinity. Included are two novel formulations of this task within a graphical model framework, as well as introduction of wMBE-K*, a message-passing scheme based on weighted Mini-Bucket Elimination for estimating and bounding K*. Additionally, a range of algorithms derived from adapting Marginal MAP algorithms on AND/OR search spaces to address the protein redesign problem is presented. Results, demonstrated on real protein benchmarks, show superior performance compared to the state-of-the-art algorithm BBK* for small and medium-sized problems. Finally, a technique for infusing determinism into graphical models that emerged from this work, which significantly speeds up inference, is presented.
[pdf] |