Workshop on Reinforcement Learning Theory

Overview

While over many years we have witnessed numerous impressive demonstrations of the power of various reinforcement learning (RL) algorithms, and while much progress was made on the theoretical side as well, the theoretical understanding of the challenges that underlie RL is still rather limited. The best studied problem settings, such as learning and acting in finite state-action Markov decision processes, or simple linear control systems fail to capture the essential characteristics of seemingly more practically relevant problem classes, where the size of the state-action space is often astronomical, the planning horizon is huge, the dynamics is complex, interaction with the controlled system is not permitted, or learning has to happen based on heterogeneous offline data, etc. To tackle these diverse issues, more and more theoreticians with a wide range of backgrounds came to study RL and have proposed numerous new models along with exciting novel developments on both algorithm design and analysis. The workshop's goal is to highlight advances in theoretical RL and bring together researchers from different backgrounds to discuss RL theory from different perspectives: modeling, algorithm, analysis, etc.

This workshop will feature seven keynote speakers from computer science, operation research, control, and statistics to highlight recent progress, identify key challenges, and discuss future directions. Invited keynotes will be augmented by contributed talks, poster presentations, panel discussions, and virtual social events.

Schedule

UTC 16:00 - 16:25 Emily Kaufmann (Invited Talk)
UTC 16:25 - 16:50 Christian Kroer (Invited Talk)
UTC 17:00 - 17:50 Short Contributed Talks:
Sparsity in the Partially Controllable LQR
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
Implicit Finite-Horizon Approximation for Stochastic Shortest Path
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
UTC 18:00 - 18:25 Animashree Anandkumar (Invited Talk)
UTC 18:25 - 18:50 Shie Mannor (Invited Talk)
UTC 19:00 - 19:30 Social Session
UTC 19:30 - 21:00 Poster Session
UTC 21:00 - 21:25 Bo Dai (Invited Talk)
UTC 21:25 - 21:50 Qiaomin Xie (Invited Talk)
UTC 22:00 - 22:50 Short Contributed Talks:
Bad-Policy Density: A Measure of Reinforcement-Learning Hardness
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Solving Multi-Arm Bandit Using a Few Bits of Communication
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
UTC 23:00 - 23:25 Art Owen (Invited Talk)
UTC 23:30 - 0:00 Panel Discussion
UTC 0:00 - 0:30 Social Session
UTC 0:30 - 2:00 Poster Session

Keynote Speakers

Anima Anandkumar

Professor
California Institute of Technology

Bo Dai

Senior Research Scientist
Google Brain

Emilie Kaufmann

Principal Researcher
CNRS Junior Researcher

Christian Kroer

Assistant Professor
Columbia University

Shie Mannor

Professor
Technion

Art Owen

Professor
Stanford University

Qiaomin Xie

Visiting Assistant Professor
Cornell University

Papers

  • Bad-Policy Density: A Measure of Reinforcement-Learning Hardness
    David Abel (DeepMind); Cameron S Allen (Brown University); Dilip Arumugam (Stanford University); D Ellis Hershkowitz (Carnegie Mellon University); Michael L. Littman (Brown University); Lawson L.S. Wong (Northeastern University)
    [Paper]
  • Finding the Near Optimal Policy via Reductive Regularization in MDPs
    Wenhao Yang (Peking University); Xiang Li (Peking University); Guangzeng Xie (Peking University); Zhihua Zhang (Peking University)
    [Paper]
  • Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning
    Sheng Zhang (Georgia Institute of Technology); Zhe Zhang (Georgia Institute of Technology); Siva Theja Maguluri (Georgia Tech)
    [Paper]
  • Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
    Thanh Nguyen-Tang (Deakin University); Sunil Gupta (Deakin University, Australia); Hung Tran-The (Deakin University); Svetha Venkatesh (Deakin University)
    [Paper]
  • Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
    Honghao Wei (University of Michigan); Xin Liu (University of Michigan); Lei Ying (University of Michigan)
    [Paper]
  • Subgaussian Importance Sampling for Off-Policy Evaluation and Learning
    Alberto Maria Metelli (Politecnico di Milano); Alessio Russo (Politecnico di Milano); Marcello Restelli (Politecnico di Milano)
    [Paper]
  • Minimax Regret for Stochastic Shortest Path
    Alon Cohen (Technion and Google Inc.); Yonathan Efroni (Microsoft Research); Yishay Mansour (Tel Aviv University and Google Research); Aviv Rosenberg (Tel Aviv University)
    [Paper]
  • Collision Resolution in Multi-player Bandits Without Observing Collision Information
    Eleni Nisioti (Inria); Nikolaos Thomos (U of Essex); Boris Bellalta (Pompeu Fabra University); Anders Jonsson (UPF)
    [Paper]
  • Marginalized Operators for Off-Policy Reinforcement Learning
    Yunhao Tang (Columbia University); Mark Rowland (DeepMind); Remi Munos (DeepMind); Michal Valko (DeepMind)
    [Paper]
  • Nonstationary Reinforcement Learning with Linear Function Approximation
    Huozhi Zhou (UIUC); Jinglin Chen (University of Illinois at Urbana-Champaign); Lav Varshney (UIUC: ECE); Ashish Jagmohan (IBM Research)
    [Paper]
  • CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
    Tengyu Xu (The Ohio State University); Yingbin Liang (The Ohio State University); Guanghui Lan (Georgia Tech)
    [Paper]
  • Sparsity in the Partially Controllable LQR
    Yonathan Efroni (Microsoft Research); Sham Kakade (University of Washington); Akshay Krishnamurthy (Microsoft); Cyril Zhang (Microsoft Research)
    [Paper]
  • Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
    Zaiwei Chen (Georgia Institute of Technology); Siva Theja Maguluri (Georgia Tech); Sanjay Shakkottai (University of Texas at Austin); Karthikeyan Shanmugam (IBM Research NY)
    [Paper]
  • Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
    Kaiqing Zhang (University of Illinois at Urbana-Champaign (UIUC)/MIT); Xiangyuan Zhang (University of Illinois at Urbana-Champaign); Bin Hu (University of Illinois at Urbana-Champaign); Tamer Basar (University of Illinois at Urbana-Champaign)
    [Paper]
  • When Is Generalizable Reinforcement Learning Tractable?
    Dhruv Malik (Carnegie Mellon University); Yuanzhi Li (CMU); Pradeep Ravikumar (Carnegie Mellon University)
    [Paper]
  • Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
    Zaiwei Chen (Georgia Institute of Technology); Sajad khodadadian (Georgia Tech); Siva Theja Maguluri (Georgia Tech)
    [Paper]
  • The Importance of Non-Markovianity in Maximum State Entropy Exploration
    Mirco Mutti (Politecnico di Milano, Università di Bologna); Riccardo De Santi (ETH Zurich ); Marcello Restelli (Politecnico di Milano)
    [Paper]
  • Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
    Stefanos Leonardos (Singapore University of Technology and Design); Will Overman (University of California, Irvine); Ioannis Panageas (UC Irvine); Georgios Piliouras (Singapore University of Technology and Design)
    [Paper]
  • Efficient Inverse Reinforcement Learning of Transferable Rewards
    Giorgia Ramponi (Politecnico di Milano); Alberto Maria Metelli (Politecnico di Milano); Marcello Restelli (Politecnico di Milano)
    [Paper]
  • Learning to Observe with Reinforcement Learning
    Mehmet Koseoglu (Hacettepe University); Ece Kunduracioglu (Hacetttepe University); Ayca Ozcelikkale (Uppsala University)
    [Paper]
  • Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
    Dhruv Malik (Carnegie Mellon University); Aldo Pacchiano (UC Berkeley); Vishwak Srinivasan (Carnegie Mellon University); Yuanzhi Li (CMU)
    [Paper]
  • Bagged Critic for Continuous Control
    Payal Bawa (University of Sydney)
    [Paper]
  • Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
    Matteo Papini (Politecnico di Milano); Andrea Tirinzoni (Inria); Aldo Pacchiano (UC Berkeley); Marcello Restelli (Politecnico di Milano); Alessandro Lazaric (FAIR); Matteo Pirotta (Facebook AI Research)
    [Paper]
  • A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs
    Andrea Tirinzoni (Inria); Matteo Pirotta (Facebook AI Research); Alessandro Lazaric (FAIR
    [Paper]
  • Optimal and instance-dependent oracle inequalities for policy evaluation
    Wenlong Mou (UC Berkeley); Ashwin Pananjady (Georgia Institute of Technology); Martin Wainwright (UC Berkeley)
    [Paper]
  • Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning
    Chenjia Bai (Harbin Institute of Technology); Lingxiao Wang (Northwestern University); Lei Han (Tencent AI Lab); Jianye Hao (Tianjin University); Animesh Garg (University of Toronto, Vector Institute, Nvidia); Peng Liu (Harbin Institute of Technology); Zhaoran Wang (Northwestern U)
    [Paper]
  • Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
    Andrea Zanette (Stanford University); Martin Wainwright (UC Berkeley); Emma Brunskill (Stanford University)
    [Paper]
  • Reward-Weighted Regression Converges to a Global Optimum
    Miroslav Strupl (IDSIA); Francesco Faccio (The Swiss AI Lab IDSIA); Dylan Ashley (IDSIA); Rupesh Kumar Srivastava (NNAISENSE); Jürgen Schmidhuber (IDSIA - Lugano)
    [Paper]
  • Solving Multi-Arm Bandit Using a Few Bits of Communication
    Osama A Hanna (UCLA); Lin Yang (UCLA); Christina Fragouli ()
    [Paper]
  • Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
    Sarah Rathnam (Harvard University); Susan Murphy (Harvard University); Finale Doshi-Velez (Harvard)
    [Paper]
  • Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
    Aviv Rosenberg (Tel Aviv University); Yishay Mansour (Tel Aviv University and Google Research)
    [Paper]
  • Learning Adversarial Markov Decision Processes with Delayed Feedback
    Tal Lancewicki (Tel-Aviv University); Aviv Rosenberg (Tel Aviv University); Yishay Mansour (Tel Aviv University and Google Research)
    [Paper]
  • Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
    Dibya Ghosh (UC Berkeley); Jad Rahme (Princeton University); Aviral Kumar (UC Berkeley); Amy Zhang (McGill University); Ryan P Adams (Princeton University); Sergey Levine (UC Berkeley)
    [Paper]
  • Statistical Inference with M-Estimators on Adaptively Collected Data
    Kelly W Zhang (Harvard University); Lucas Janson (Harvard University); Susan Murphy (Harvard University)
    [Paper]
  • Randomized Least Squares Policy Optimization
    Haque Ishfaq (Mila, McGill University); Zhuoran Yang (Princeton.edu); Andrei Lupu (Mila, McGill University); Viet Nguyen (Mila, McGill University); Lewis Liu (Mila & DIRO); Riashat Islam (MILA, Mcgill University); Zhaoran Wang (Northwestern); Doina Precup (McGill University)
    [Paper]
  • Gap-Dependent Unsupervised Exploration for Reinforcement Learning
    Jingfeng Wu (Johns Hopkins University); Vladimir Braverman (Johns Hopkins University); Lin Yang (UCLA)
    [Paper]
  • Online Learning for Stochastic Shortest Path Model via Posterior Sampling
    Mehdi Jafarnia Jahromi (University of Southern California); Liyu Chen (USC); Rahul Jain (University of Southern California); Haipeng Luo (USC)
    [Paper]
  • Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
    Kefan Dong (Stanford University); Jiaqi Yang (Tsinghua University); Tengyu Ma (Stanford University)
    [Paper]
  • Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
    Semih Cayci (University of Illinois at Urbana-Champaign); Niao He (ETH Zurich); R Srikant (UIUC)
    [Paper]
  • Decentralized Q-Learning in Zero-sum Markov Games
    Muhammed Sayin (MIT); Kaiqing Zhang (University of Illinois at Urbana-Champaign (UIUC)/MIT); David S Leslie (Lancaster University); Tamer Basar (University of Illinois at Urbana-Champaign); Asuman Ozdaglar (MIT)
    [Paper]
  • Implicit Finite-Horizon Approximation for Stochastic Shortest Path
    Liyu Chen (USC); Mehdi Jafarnia Jahromi (University of Southern California); Rahul Jain (University of Southern California); Haipeng Luo (USC)
    [Paper]
  • On the Theory of Reinforcement Learning with Once-per-Episode Feedback
    Niladri S Chatterji (UC Berkeley); Aldo Pacchiano (UC Berkeley); Peter Bartlett (); Michael Jordan (UC Berkeley)
    [Paper]
  • Model-based Offline Reinforcement Learning with Local Misspecification
    Kefan Dong (Stanford University); Ramtin Keramati (Stanford University); Emma Brunskill (Stanford University)
    [Paper]
  • Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
    Yue Wu (University of California, Los Angeles); Dongruo Zhou (UCLA); Quanquan Gu (University of California, Los Angeles)
    [Paper]
  • Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
    Wenshuo Guo (UC Berkeley); Kumar Krishna Agrawal (UC Berkeley); Aditya Grover (Facebook AI Research); Vidya Muthukumar (UC Berkeley); Ashwin Pananjady (UC Berkeley)
    [Paper]
  • Model-Free Approach to Evaluate Reinforcement Learning Algorithms
    Denis Belomestny (Universitaet Duisburg-Essen); Ilya Levin (National Research University "Higher School of Economics"); Eric Moulines (Ecole Polytechnique); Alexey Naumov (National Research University Higher School of Economics); Sergey Samsonov (National Research University Higher School of Economics); Veronika Zorina (National Research University Higher School of Economics)
    [Paper]
  • Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
    Yonathan Efroni (Microsoft Research); Dipendra Misra (Microsoft); Akshay Krishnamurthy (Microsoft); Alekh Agarwal (Microsoft); John Langford (Microsoft)
    [Paper]
  • Learning Pareto-Optimal Policies in Low-Rank Cooperative Markov Games
    Abhimanyu Dubey (Massachusetts Institute of Technology); Alex `Sandy' Pentland (MIT)
    [Paper]
  • Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
    Ming Yin (UC Santa Barbara); Yu-Xiang Wang (UC Santa Barbara)
    [Paper]
  • Bridging The Gap between Local and Joint Differential Privacy in RL
    Evrard Garcelon (Facebook AI Research ); Vianney Perchet (ENS Paris-Saclay & Criteo AI Lab); Ciara Pike-Burke (Imperial College London); Matteo Pirotta (Facebook AI Research)
    [Paper]
  • Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
    Yu Bai (Salesforce Research); Chi Jin (Princeton University); Huan Wang (Salesforce Research); Caiming Xiong (Salesforce Research)
    [Paper]
  • Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
    Ming Yin (UC Santa Barbara); Yu Bai (Salesforce Research); Yu-Xiang Wang (UC Santa Barbara)
    [Paper]
  • Mixture of Step Returns in Bootstrapped DQN
    Po-Han Chiang (National Tsing Hua University); Hsuan-Kung Yang (National Tsing Hua University); Zhang-Wei Hong (Preferred Networks); Chun-Yi Lee (National Tsing Hua University)
    [Paper]
  • Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation
    Jiafan He (UCLA); Dongruo Zhou (UCLA); Quanquan Gu (University of California, Los Angeles)
    [Paper]
  • Provably efficient exploration-free transfer RL for near-deterministic latent dynamics
    Yao Liu (Stanford University); Dipendra Misra (Microsoft); Miroslav Dudik (Microsoft); Robert Schapire (Microsoft)
    [Paper]
  • Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
    Jean Tarbouriech (FAIR & Inria); Runlong Zhou (Tsinghua University); Simon Du (University of Washington); Matteo Pirotta (Facebook AI Research); Michal Valko (DeepMind); Alessandro Lazaric (FAIR)
    [Paper]
  • A Spectral Approach to Off-Policy Evaluation for POMDPs
    Yash Nair (Harvard College); Nan Jiang (University of Illinois at Urbana-Champaign)
    [Paper]
  • Mind the Gap: Safely Bridging Offline and Online Reinforcement Learning
    Wanqiao Xu (University of Michigan); Kan Xu (University of Pennsylvania); Hamsa Bastani (Wharton); Osbert Bastani (University of Pennsylvania)
    [Paper]
  • Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
    Yue Guan (Georgia Institute of Technology); Qifan Zhang (Georgia Institute of Technology); Panagiotis Tsiotras (Georgia Institute of Technology)
    [Paper]
  • Invariant Policy Learning: A Causal Perspective
    Sorawit Saengkyongam (University of Copenhagen); Nikolaj Thams (University of Copenhagen); Jonas Peters (University of Copenhagen); Niklas Pfister (University of Copenhagen)
    [Paper]
  • A functional mirror ascent view of policy gradient methods with function approximation
    Sharan Vaswani (Amii, University of Alberta); Olivier Bachem (Google Brain); Simone Totaro (Mila, Université de Montréal); Robert Mueller (TU Munich); Matthieu Geist (Google Brain); Marlos C. Machado (Amii, University of Alberta, and DeepMind); Pablo Samuel Castro (Google Brain); Nicolas Le Roux (MILA, Université de Montréal and McGill University)
    [Paper]
  • Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
    Tengyang Xie (University of Illinois at Urbana-Champaign); Nan Jiang (University of Illinois at Urbana-Champaign); Huan Wang (Salesforce Research); Caiming Xiong (Salesforce Research); Yu Bai (Salesforce Research)
    [Paper]
  • Robust online control with model misspecification
    Xinyi Chen (Google); Udaya Ghai (Princeton University); Elad Hazan (Princeton University); Alexandre Megretsky (Massachusetts Institute of Technology)
    [Paper]
  • Online Sub-Sampling for Reinforcement Learning with General Function Approximation
    Dingwen Kong (Peking University); Ruslan Salakhutdinov (Carnegie Mellon University); Ruosong Wang (Carnegie Mellon University); Lin Yang (UCLA)
    [Paper]
  • Is Pessimism Provably Efficient for Offline RL?
    Ying Jin (Stanford University); Zhuoran Yang (Princeton.edu); Zhaoran Wang (Northwestern U)
    [Paper]
  • Topological Experience Replay for Fast Q-Learning
    Zhang-Wei Hong (Massachusetts Institute of Technology); Tao Chen (MIT); Yen-Chen Lin (MIT); Joni Pajarinen (Aalto University); Pulkit Agrawal (MIT)
    [Paper]
  • Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
    Jiafan He (UCLA); Dongruo Zhou (UCLA); Quanquan Gu (University of California, Los Angeles)
    [Paper]
  • A general sample complexity analysis of vanilla policy gradient
    Rui YUAN (Facebook AI Research); Robert M Gower (Telecom Paris Tech); Alessandro Lazaric (FAIR)
    [Paper]
  • The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
    Chi Jin (Princeton University); Qinghua Liu (Princeton University); Tiancheng Yu (MIT)
    [Paper]
  • Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
    Chi Jin (Princeton University); Qinghua Liu (Princeton University); Sobhan Miryoosefi (Princeton University)
    [Paper]
  • Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity
    Jonathan Lee (Stanford University); Weihao Kong (University of Washington); Aldo Pacchiano (UC Berkeley); Vidya K Muthukumar (Georgia Institute of Technology); Emma Brunskill (Stanford University)
    [Paper]
  • A Short Note on the Relationship of Information Gain and Eluder Dimension
    Kaixuan Huang (Princeton University); Sham Kakade (University of Washington); Jason Lee (Princeton); Qi Lei (Princeton University)
    [Paper]
  • Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
    Shunshi Zhang (University of Toronto, Vector Institute); Murat A Erdogdu (University of Toronto, Vector Institute); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
    [Paper]
  • Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation
    Zixiang Chen (UCLA); Dongruo Zhou (UCLA); Quanquan Gu (University of California, Los Angeles)
    [Paper]
  • Improved Estimator Selection for Off-Policy Evaluation
    George Tucker (Google Brain); Jonathan Lee (Stanford)
    [Paper]
  • A Boosting Approach to Reinforcement Learning
    Nataly Brukhim (Princeton University); Elad Hazan (Princeton University); Karan Singh (Microsoft Research)
    [Paper]
  • Learning Stackelberg Equilibria in Sequential Price Mechanisms
    Gianluca Brero (Harvard University); Darshan Chakrabarti (Harvard University); Alon Eden (Harvard University); Matthias Gerstgrasser (Harvard University); Vincent Li (Harvard University); David Parkes (Harvard University)
    [Paper]
  • Refined Policy Improvement Bounds for MDPs
    Jim Dai (Cornell University); Mark Gluzman (Cornell University)
    [Paper]
  • Meta Learning MDPs with linear transition models
    Robert Müller (Technical University of Munich); Aldo Pacchiano (UC Berkeley); Jack Parker-Holder (University of Oxford)
    [Paper]
  • The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
    Tiancheng Jin (University of Southern California); Longbo Huang (IIIS, Tsinghua Univeristy); Haipeng Luo (USC)
    [Paper]
  • Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds
    Yahya Sattar (University of California Riverside); Zhe Du (University of Michigan); Davoud Ataee Tarzanagh (Michigan); Necmiye Ozay (University of Michigan); Laura Balzano (University of Michigan); Samet Oymak (University of California, Riverside)
    [Paper]
  • Non-Stationary Representation Learning in Sequential Multi-Armed Bandits
    Qin Yuzhen (University of California, Riverside); Tommaso Menara (University of California Riverside); Samet Oymak (University of California, Riverside); ShiNung Ching (Washington University in St. Louis); Fabio Pasqualetti (University of California, Riverside)
    [Paper]
  • Value-Based Deep Reinforcement Learning Requires Explicit Regularization
    Aviral Kumar (UC Berkeley); Rishabh Agarwal (Google Research, Brain Team); Aaron Courville (University of Montreal); Tengyu Ma (Stanford); George Tucker (Google Brain); Sergey Levine (UC Berkeley)
    [Paper]
  • Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
    Haipeng Luo (USC); Chen-Yu Wei (University of Southern California); Chung-Wei Lee (University of Southern California)
    [Paper]
  • On the Sample Complexity of Average-reward MDPs
    Yujia Jin (Stanford University); Aaron Sidford (Stanford)
    [Paper]
  • Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case
    Gandharv Patil (McGill University); Prashanth L.A. (IIT Madras); Doina Precup (McGill University)
    [Paper]
  • Multi-Task Offline Reinforcement Learning with Conservative Data Sharing
    Tianhe Yu (Stanford University); Aviral Kumar (UC Berkeley); Yevgen Chebotar (Google); Karol Hausman (Google Brain); Sergey Levine (UC Berkeley); Chelsea Finn (Stanford)
    [Paper]
  • Provably Efficient Multi-Task Reinforcement Learning with Model Transfer
    Chicheng Zhang (University of Arizona); Zhi Wang (University of California, San Diego)
    [Paper]

Important Dates

 

Paper Submission Deadline: June 7th, 2021, 11:59 PM UTC ([CMT])

Author Notification: July 7th, 2021

Final Version: July 14th, 2021

Workshop: July 24th, 4:00PM UTC - July 25, 2: 00AM UTC

Program Committee

  • David Abel (DeepMind)
  • Sanae Amani (UCLA)
  • Zaiwei Chen (Georgia Tech)
  • Yifang Chen (University of Washington)
  • Xinyi Chen (Princeton)
  • Qiwen Cui (Peking University)
  • Yaqi Duan (Princeton)
  • Vikranth Dwaracherla (Stanford)
  • Fei Feng (UCLA)
  • Dylan Foster (MIT)
  • Botao Hao (DeepMind)
  • Ying Jin (Stanford)
  • Sajad Khodadadian (Georgia Tech)
  • Tor Lattimore (DeepMind)
  • Qinghua Liu (Princeton)
  • Thodoris Lykouris (MSR)
  • Gaurav Mahajan (UCSD)
  • Sobhan Miryoosefi (Princeton)
  • Aditiya Modi (UMich)
  • Vidya Muthukumar (Georgia Tech)
  • Gergely Neu (Pompeu Fabra University)
  • Nived Rajaraman (UC Berkeley)
  • Max Simchowitz (UC Berkeley)
  • Yi Su (Cornell)
  • Jean Tarbouriech (Inria Lille)
  • Masatoshi Uehara (Cornell)
  • Ruosong Wang (CMU)
  • Jingfeng Wu (JHU)
  • Tengyang Xie (UIUC)
  • Jiaqi Yang (Tsinghua Univeristy)
  • Ming Yin (UCSB)
  • Andrea Zanette (Stanford University)
  • Zihan Zhang (Tsinghua University)
  • Kaiqing Zhang (UIUC)
  • Angela Zhou (Cornell)

Workshop Organizers                

Shipra Agarwal

Columbia University

Simon S. Du

University of Washington

Niao He

ETH Zürich

Csaba Szepesvári

University of Alberta / Deepmind

Lin F. Yang

University of California, Los Angeles


We thank Hoang M. Le from providing the website template.