Dr. Rina Dechter - University of California at Irvine ZOT!
home | publications | book | courses | research Revised on Dec. 11, 2019


CompSci 295 Reinforcement Learning, Fall 2019


  • Classroom: DBH 1423
  • Day: Friday
  • Time: 12:00 - 2:40 pm
  • Instructor: Rina Dechter - dechter@ics.uci.edu

The class will cover topics in Reinforcement Learning and in Planning Under Uncertainty. The class will run as a seminar. I will give the first few introductory classes. Then students will be required to read and present papers from the literature or chapters in books to the class and do a project which can be based on their selected papers. There may also be some homework assignments. The class is intended for PhD students in the area of AI and Machine Learning, with 271 and 273 as prerequisite courses. If you are a second year master student that already took 271 and 273, please talk to me to obtain approval.

Project Spreadsheet

Relevant sources (books or classes):

Background Papers:

  • Learning to Predict by the Methods of Temporal Differences [pdf]
    Richard S. Sutton
    Machine Learning, volume 3, pp 9-44, 1988.

  • An Upper Bound on the Loss from Approximate Optimal-Value Functions [pdf]
    Satinder P. Singh and Richard C. Yee
    Machine Learning, volume 16, pp 227-233, 1994.

  • Algorithms for Sequential Decision Making [pdf]
    Michael L. Littman
    Ph.D. Dissertation, Brown University, Providence, RI, USA, March 1996.

  • Reinforcement Learning: A Survey [pdf]
    Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore
    Journal of Artificial Intelligence Research, volume 4, pp 237-285, 1996.

  • Decision-Theoretic Planning: Structural Assumptions and Computational Leverage [pdf]
    Craig Boutilier, Thomas Dean and Steve Hanks
    Journal of Artificial Intelligence Research, volume 11, pp 1-94, 1999.

  • SPUDD: Stochastic Planning using Decision Diagrams [pdf]
    Jesse Hoey, Robert St-Aubin, Alan Hu and Craig Boutilier
    UAI-99. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 1999.

  • Policy gradient methods for reinforcement learning with function approximation [pdf]
    Richard S. Sutton, David McAllester, Satinder Singh and Yishay Mansour
    NIPS-99. 12th International Conference on Neural Information Processing Systems, Denver, Colorado, USA, December 1999.

  • Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms [pdf]
    Satinder Singh, Tommi Jaakkola, Michael L. Littman and Csaba Szepesvári
    Machine Learning, volume 39, pp 287–308, 2000.

  • Near-Optimal Reinforcement Learning in Polynomial Time [pdf]
    Michael Kearns and Satinder Singh
    Machine Learning, volume 49, pp 209-232, 2002.

  • R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning [pdf]
    Ronen I. Brafman and Moshe Tennenholtz
    Journal of Machine Learning Research, volume 3, pp 213-231, 2002.

  • Equivalence notions and model minimization in Markov decision processes [pdf]
    Robert Givan, Thomas Dean and Matthew Greig
    Artificial Intelligence, volume 147, pp 163-223, 2003.

  • Least-Squares Policy Iteration [pdf]
    Michail G. Lagoudakis and Ronald Parr
    Journal of Machine Learning Research, volume 4, pp 1107-1149, 2003.

  • Efficient Solution Algorithms for Factored MDPs [pdf]
    Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman
    Journal of Artificial Intelligence Research, volume 19, pp 399-468, 2003.

  • Tree-Based Batch Mode Reinforcement Learning [pdf]
    Damien Ernst, Pierre Geurts and Louis Wehenkel
    Journal of Machine Learning Research, volume 6, pp 503-556, 2005.

  • An Analytic Solution to Discrete Bayesian Reinforcement Learning [pdf]
    Pascal Poupart, Nikos Vlassis, Jesse Hoey, Kevin Regan
    ICML-06. 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, June 2006.

  • Bandit based monte-carlo planning [pdf]
    Levente Kocsis, Csaba Szepesvári
    ECML-06. 17th European Conference on Machine Learning, Berlin, Germany, September 2006.

  • Knows What It Knows: A Framework For Self-Aware Learning [pdf]
    Lihong Li, Michael L. Littman, Thomas J. Walsh
    ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008.

  • An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning [pdf]
    Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
    ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008.

  • An analysis of model-based Interval Estimation for Markov Decision Processes [pdf]
    Alexander L.Strehl and Michael L.Littman
    Journal of Computer and System Sciences, volume 74, pp 1309-1331, 2008.

  • A Bayesian sampling approach to exploration in reinforcement learning [pdf]
    John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate
    UAI-09. 25th Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, June 2009.

  • Fast gradient-descent methods for temporal-difference learning with linear function approximation [pdf]
    Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora
    ICML-09. 26th International Conference on Machine Learning, Montreal, Quebec, Canada, June 2009.

  • Reinforcement Learning and Simulation-Based Search in Computer Go [pdf]
    David Silver
    Ph.D. Dissertation, University of Alberta, Edmonton, Alberta, Canada, 2009.

  • Transfer Learning for Reinforcement Learning Domains: A Survey [pdf]
    Matthew E. Taylor and Peter Stone
    Journal of Machine Learning Research, volume 10, pp 1633-1685, 2009.

  • Toward Off-Policy Learning Control with Function Approximation [pdf]
    Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard S. Sutton
    ICML-10. 27th International Conference on Machine Learning, Haifa, Israel, June 2010.

  • Monte Carlo tree search in Kriegspiel [pdf]
    Paolo Ciancarini and Gian Piero Favini
    Artificial Intelligence, volume 174, pp 670-684, 2010.

  • Monte-Carlo tree search and rapid action value estimation in computer Go [pdf]
    Sylvain Gelly and David Silver
    Artificial Intelligence, volume 175, pp 1856-1875, 2011.

  • Greedy Algorithms for Sparse Reinforcement Learning [pdf]
    Christopher Painter-Wakefield, Ronald Parr
    ICML-12. 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, July 2012.

  • A Survey of Monte Carlo Tree Search Methods [pdf]
    Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton
    IEEE Transactions on Computational Intelligence and AI in Games, volume 4, pp 1-43, 2012.

  • Batch-iFDD for representation expansion in large MDPs [pdf]
    Alborz Geramifard, Thomas J. Walsh, Nicholas Roy, Jonathan P. How
    UAI-13. 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, Washington, USA, August 2013.

  • Offline policy evaluation across representations with applications to educational games [pdf]
    Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, Zoran Popovic
    AAMAS-14. 2014 International Conference on Autonomous Agents and Multi-agent Systems, Paris, France, May 2014.

  • High-Confidence Off-Policy Evaluation [pdf]
    Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh
    AAAI-15. 29th AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 2015.

  • Policy evaluation using the Ω-return [pdf]
    Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris
    NIPS-15. 28th International Conference on Neural Information Processing Systems, Montreal, Canada, December 2015.

  • Mastering the game of Go without human knowledge [pdf]
    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel and Demis Hassabis
    Nature, volume 550, pp 354–359, 2017.

Papers from IJCAI-2019:

  • Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning [pdf]
    Wenjie Shi, Shiji Song, Cheng Wu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Incremental Learning of Planning Actions in Model-Based Reinforcement Learning [pdf]
    Jun Hao Alvin Ng, Ronald P. A. Petrick
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Autoregressive Policies for Continuous Control Deep Reinforcement Learning [pdf]
    Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Sharing Experience in Multitask Reinforcement Learning [pdf]
    Tung-Long Vuong, Do-Van Nguyen, Tai-Long Nguyen, Cong-Minh Bui, Hai-Dang Kieu, Viet-Cuong Ta, Quoc-Long Tran, Thanh-Ha Le
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Adversarial Imitation Learning from Incomplete Demonstrations [pdf]
    Mingfei Sun, Xiaojuan Ma
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • A Restart-based Rank-1 Evolution Strategy for Reinforcement Learning [pdf]
    Zefeng Chen, Yuren Zhou, Xiao-yu He, Siyu Jiang
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control [pdf]
    Kenny Young, Baoxiang Wang, Matthew E. Taylor
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Successor Options: An Option Discovery Framework for Reinforcement Learning [pdf]
    Rahul Ramesh, Manan Tomar, Balaraman Ravindran
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents [pdf]
    Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Jiale Zhi, Ludwig Schubert, Marc G. Bellemare, Jeff Clune, Joel Lehman
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts [pdf]
    Hamoon Azizsoltani, Yeo Jin Kim, Markel Sanz Ausin, Tiffany Barnes, Min Chi
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Experience Replay Optimization [pdf]
    Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, Xia Hu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Interactive Teaching Algorithms for Inverse Reinforcement Learning [pdf]
    Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human and Agent Demonstrations [pdf]
    Zhaodong Wang, Matthew E. Taylor
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Meta Reinforcement Learning with Task Embedding and Shared Policy [pdf]
    Lin Lan, Zhenguo Li, Xiaohong Guan, Pinghui Wang
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Planning with Expectation Models [pdf]
    Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Dynamic Electronic Toll Collection via Multi-Agent Deep Reinforcement Learning with Edge-Based Graph Convolutional Networks [pdf]
    Wei Qiu, Haipeng Chen, Bo An
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Randomized Adversarial Imitation Learning for Autonomous Driving [pdf]
    MyungJae Shin, Joongheon Kim
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration [pdf]
    Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush R. Varshney, Murray Campbell, Moninder Singh, Francesca Rossi
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Building Personalized Simulator for Interactive Search [pdf]
    Qianlong Liu, Baoliang Cui, Zhongyu Wei, Baolin Peng, Haikuan Huang, Hongbo Deng, Jianye Hao, Xuanjing Huang, Kam-Fai Wong
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces [pdf]
    Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, Changjie Fan
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Imitation Learning from Video by Leveraging Proprioception [pdf]
    Faraz Torabi, Garrett Warnell, Peter Stone
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning [pdf]
    Shihong Song, Jiayi Weng, Hang Su, Dong Yan, Haosheng Zou, Jun Zhu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • DeepMellow: Removing the Need for a Target Network in Deep Q-Learning [pdf]
    Seungchan Kim, Kavosh Asadi, Michael Littman, George Konidaris
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • On Principled Entropy Exploration in Policy Optimization [pdf]
    Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards [pdf]
    Zhao-Yang Fu, De-Chuan Zhan, Xin-Chun Li, Yi-Xing Lu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Approximability of Constant-horizon Constrained POMDP [pdf]
    Majid Khonji, Ashkan Jasour, Brian Williams
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning [pdf]
    Alberto Castellini, Georgios Chalkiadakis, Alessandro Farinelli
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks [pdf]
    Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, Ufuk Topcu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Regular Decision Processes: A Model for Non-Markovian Domains [pdf]
    Ronen I. Brafman, Giuseppe De Giacomo
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Approximability of Constant-horizon Constrained POMDP [pdf]
    Majid Khonji, Ashkan Jasour, Brian Williams
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning [pdf]
    Alberto Castellini, Georgios Chalkiadakis, Alessandro Farinelli
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks [pdf]
    Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, Ufuk Topcu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Regular Decision Processes: A Model for Non-Markovian Domains [pdf]
    Ronen I. Brafman, Giuseppe De Giacomo
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Using Natural Language for Reward Shaping in Reinforcement Learning [pdf]
    Prasoon Goyal, Scott Niekum, Raymond J. Mooney
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Monte Carlo Tree Search for Policy Optimization [pdf]
    Xiaobai Ma, Katherine Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Hill Climbing on Value Estimates for Search-control in Dyna [pdf]
    Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Recurrent Existence Determination Through Policy Optimization [pdf]
    Baoxiang Wang
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space [pdf]
    Zhou Fan, Rui Su, Weinan Zhang, Yong Yu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Transfer of Temporal Logic Formulas in Reinforcement Learning [pdf]
    Zhe Xu, Ufuk Topcu
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Measuring Structural Similarities in Finite MDPs [pdf]
    Hao Wang, Shaokang Dong, Ling Shao
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Solving Continual Combinatorial Selection via Deep Reinforcement Learning [pdf]
    Hyungseok Song, Hyeryung Jang, Hai H. Tran, Se-eun Yoon, Kyunghwan Son, Donggyu Yun, Hyoju Chung, Yung Yi
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Reinforcement Learning Experience Reuse with Policy Residual Representation [pdf]
    WenJi Zhou, Yang Yu, Yingfeng Chen, Kai Guan, Tangjie Lv, Changjie Fan, Zhi-Hua Zhou
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains [pdf]
    Matthieu Zimmer, Paul Weng
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Assumed Density Filtering Q-learning [pdf]
    Heejin Jeong, Clark Zhang, George J. Pappas, Daniel D. Lee
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Curriculum Learning for Cumulative Return Maximization [pdf]
    Francesco Foglino, Christiano Coletto Christakou, Ricardo Luna Gutierrez, Matteo Leonetti
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Leveraging Human Guidance for Deep Reinforcement Learning Tasks [pdf]
    Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Learning and Inference for Structured Prediction: A Unifying Perspective [pdf]
    Aryan Deshwal, Janardhan Rao Doppa, Dan Roth
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Deep Learning for Video Captioning: A Review [pdf]
    Shaoxiang Chen, Ting Yao, Yu-Gang Jiang
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Recent Advances in Imitation Learning from Observation [pdf]
    Faraz Torabi, Garrett Warnell, Peter Stone
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • A Survey of Reinforcement Learning Informed by Natural Language [pdf]
    Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Sequential Recommender Systems: Challenges, Progress and Prospects [pdf]
    Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan Z. Sheng, Mehmet Orgun
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • A Strongly Asymptotically Optimal Agent in General Environments [pdf]
    Michael K. Cohen, Elliot Catt, Marcus Hutter
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Structure Learning for Safe Policy Improvement [pdf]
    Thiago D. Simão, Matthijs T. J. Spaan
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments [pdf]
    Elaheh Barati, Xuewen Chen
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • Advantage Amplification in Slowly Evolving Latent-State Environments [pdf]
    Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets [pdf]
    Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, Craig Boutilier
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

  • MineRL: A Large-Scale Dataset of Minecraft Demonstrations [pdf]
    William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov
    IJCAI-19. 28th International Joint Conference on Artificial Intelligence, Macao, China, August 2019.

Conferences, Symposia, Workshops:

  • DRLW-18. Deep Reinforcement Learning Workshop, NIPS 2018, Montréal, Canada, December 2018.

  • DRLS-17. Deep Reinforcement Learning Symposium, NIPS 2017, Long Beach, USA, December 2017.

  • NIPS-17. Advances in Neural Information Processing Systems, NIPS 2017, Long Beach, USA, December 2017.

  • DRLW-16. Deep Reinforcement Learning Workshop, NIPS 2016, Barcelona, Spain, December 2016.

  • EWRL-16. The 13th European Workshop on Reinforcement Learning, Barcelona, Spain, December 2016.

  • DRLW-15. Deep Reinforcement Learning Workshop, NIPS 2015, Montreal, Canada, December 2015.

Tools for RL:

Schedule

Week           Date Topic Readings and Links
Week 0 9/27 No class!

Toward the first class, I recommend (optional homework) watching the first 3 lectures of David Silver. They correspond to chapters 1,3 and 4 in the text of Sutton and Barto. Although I will cover this material in the first class, it may be at a relatively high pace.
 

Week 1 10/4 Chapters 1,3,4 in S&B Class 1 Slides

HW 1

Week 2 10/11 Chapters 5,6,7 in S&B Class 2 Slides

HW 2

Week 3
10/18 Chapters 9 in S&B

Presentations:
     Bobak Pezeshki: A Survey of Monte Carlo Tree Search Methods
 

Class 3 Slides


Pezeshki Slides | Paper

Week 4 10/25 Chapters 10 in S&B

Presentations:
     Dr. Kalev Kask: Multi-Armed Bandits
 

Class 4 Slides (Dechter)


Class 4 Slides (Kask)

HW 3

Week 5 11/1 Presentations:
     Yasaman Razeghi: Efficient Solution Algorithms for Factored MDPs
     Michael (Tao-Yi) Lee: Greedy Algorithms for Sparse Reinforcement Learning
     Hieu Le: Incremental Learning of Planning Actions in Model-Based Reinforcement Learning
     Priya Dhulipala: R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
 
 
Razeghi Slides | Paper
M. Lee Slides | Paper
H. Le Slides | Paper

Dhulipala Slides | Paper

HW 4

Week 6 11/8 No class!
Week 7 11/15 Presentations:
     Momoko Kono: An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning
     Yanqi Gu: An Analytic Solution to Discrete Bayesian Reinforcement Learning
     Bahareh Harandizadeh: Policy Gradient Methods for Reinforcement Learning with Function Approximation
     Heeyun (Anna) Schwarz: An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments
 
 
Kono Slides | Paper

Gu Slides | Paper
Harandizadeh Slides | Paper

Schwarz Slides | Paper

HW 5

Week 8
11/22
Presentations:
     Navid Salehnamadi: An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
     Tootia Giyahchi: Knows What It Knows: A Framework For Self-Aware Learning
     Andy Thai: Monte-Carlo tree search and rapid action value estimation in computer Go
     Hsiang-Shun Shih: Using Natural Language for Reward Shaping in Reinforcement Learning
 
 
Salehnamadi Slides | Paper

Giyahchi Slides | Paper
Thai Slides | Paper
Shih Slides | Paper

HW 6

Week 9
11/29
Thanksgiving Holiday!
Week 10
12/6
Presentations:
     James Quintero: Automated Machine Learning with Monte-Carlo Tree Search
     Stelios Stavroulakis: Solving the Rubik’s Cube Without Human Knowledge
     Porhemmat Saman: Deep Learning for Video Captioning: A Review
     Paul Jazayeri: Recent Advances in Imitation Learning from Observation
 
 
Quintero Slides | Paper
Stavroulakis Slides | Paper
Porhemmat Slides | Paper
Jazayeri Slides | Paper

HW 7 (Described in Class By Professor)

Week 11
12/13