Skip to content

L706077/Deep-Reinforcement-Learning-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Deep-Reinforcement-Learning-Papers

awesome deep learning papers for reinforcement learning


一. First Start : DQN

  • [1. Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.]

  • [2. Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.]

二. DQN的各種改進版本(注重算法上的改進)

  • [1. Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.]

  • [2. Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.]

  • [3. Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.]

  • [4. Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.]

  • [5. Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.]

  • [6. Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.]

  • [7. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.]

  • [8. Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver]

  • [9. Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.]

  • [10. State of the Art Control of Atari Games using shallow reinforcement learning]

  • [11. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening]

  • [12. Deep Reinforcement Learning with Averaged Target DQN]

  • [13. Safe and Efficient Off-Policy Reinforcement Learning]

  • [14. The Predictron: End-To-End Learning and Planning ]

三. DQN的各種改進版本(注重於模型的改進)

  • [1. Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.]

  • [2. Deep Attention Recurrent Q-Network]

  • [3. Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.]

  • [4. Progressive Neural Networks]

  • [5. Language Understanding for Text-based Games Using Deep Reinforcement Learning]

  • [6. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks]

  • [7. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation]

  • [8. Recurrent Reinforcement Learning: A Hybrid Approach]

  • [9. Value Iteration Networks, NIPS, 2016]

  • [10. MazeBase:A sandbox for learning from games]

  • [11. Strategic Attentive Writer for Learning Macro-Actions]

四. 基于策略梯度的深度強化學習

深度策略梯度:

  • [1. End-to-End Training of Deep Visuomotor Policies]

  • [2. Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search]

  • [3. Trust Region Policy Optimization]

深度演員評論家算法:

  • [1. Deterministic Policy Gradient Algorithms]

  • [2. Continuous control with deep reinforcement learning]

  • [3. High-Dimensional Continuous Control Using Using Generalized Advantage Estimation]

  • [4. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies]

  • [5. Deep Reinforcement Learning in Parameterized Action Space]

  • [6. Memory-based control with recurrent neural networks]

  • [7. Terrain-adaptive locomotion skills using deep reinforcement learning]

  • [8. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies]

  • [9. SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY]

搜索與監督:

  • [1. End-to-End Training of Deep Visuomotor Policies]

  • [2. Interactive Control of Diverse Complex Characters with Neural Networks]

連續動作空間下探索改進:

  • [1. Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks]

结合策略梯度和Q學習:

  • [1. Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC]

  • [2. PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING]

其它策略梯度文章:

  • [1. Gradient Estimation Using Stochastic Computation Graphs]

  • [2. Continuous Deep Q-Learning with Model-based Acceleration]

  • [3. Benchmarking Deep Reinforcement Learning for Continuous Control]

  • [4. Learning Continuous Control Policies by Stochastic Value Gradients]

  • [5. Generalizing Skills with Semi-Supervised Reinforcement Learning]

五. 分層DRL

  • [1. Deep Successor Reinforcement Learning]

  • [2. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation]

  • [3. Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks]

  • [4. Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel]

六. DRL中的多任務和遷移學習

  • [1. ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources]

  • [2. A Deep Hierarchical Approach to Lifelong Learning in Minecraft]

  • [3. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning]

  • [4. Policy Distillation]

  • [5. Progressive Neural Networks]

  • [6. Universal Value Function Approximators]

  • [7. Multi-task learning with deep model based reinforcement learning]

  • [8. Modular Multitask Reinforcement Learning with Policy Sketches]

七. 基于外部記憶模塊的DRL模型

  • [1. Control of Memory, Active Perception, and Action in Minecraft]

  • [2. Model-Free Episodic Control]

八. DRL中探索與利用問題

  • [1. Action-Conditional Video Prediction using Deep Networks in Atari Games]

  • [2. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks]

  • [3. Deep Exploration via Bootstrapped DQN]

  • [4. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation]

  • [5. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models]

  • [6. Unifying Count-Based Exploration and Intrinsic Motivation]

  • [7. #Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning]

  • [8. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning]

  • [9. VIME: Variational Information Maximizing Exploration]

九. 多Agent的DRL

  • [1. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks]

  • [2. Multiagent Cooperation and Competition with Deep Reinforcement Learning]

十. 逆向DRL

  • [1. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization]

  • [2. Maximum Entropy Deep Inverse Reinforcement Learning]

  • [3. Generalizing Skills with Semi-Supervised Reinforcement Learning]

十一. 探索+監督學習

  • [1. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning]

  • [2. Better Computer Go Player with Neural Network and Long-term Prediction]

  • [3. Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.]

十二. 異步DRL

  • [1. Asynchronous Methods for Deep Reinforcement Learning]

  • [2. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU]

十三:適用于難度較大的遊戲場景

  • [1. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.]

  • [2. Strategic Attentive Writer for Learning Macro-Actions]

  • [3. Unifying Count-Based Exploration and Intrinsic Motivation]

十四:單個網路玩多個遊戲

  • [1. Policy Distillation]

  • [2. Universal Value Function Approximators]

  • [3. Learning values across many orders of magnitude]

十五:德州poker

  • [1. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games]

  • [2. Fictitious Self-Play in Extensive-Form Games]

  • [3. Smooth UCT search in computer poker]

十六:Doom遊戲

  • [1. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning]

  • [2. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning]

  • [3. Playing FPS Games with Deep Reinforcement Learning]

  • [4. LEARNING TO ACT BY PREDICTING THE FUTURE]

  • [5. Deep Reinforcement Learning From Raw Pixels in Doom]

十七:大规模動作空間

  • [1. Deep Reinforcement Learning in Large Discrete Action Spaces]

十八:參數化連續動作空間

  • [1. Deep Reinforcement Learning in Parameterized Action Space]

十九:Deep Model

  • [1. Learning Visual Predictive Models of Physics for Playing Billiards]

  • [2. J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv]

  • [3. Learning Continuous Control Policies by Stochastic Value Gradients]

  • [4.Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models]

  • [5. Action-Conditional Video Prediction using Deep Networks in Atari Games]

  • [6. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models]

二十:DRL應用

機器人領域:

  • [1. Trust Region Policy Optimization]

  • [2. Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control]

  • [3. Path Integral Guided Policy Search]

  • [4. Memory-based control with recurrent neural networks]

  • [5. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection]

  • [6. Learning Deep Neural Network Policies with Continuous Memory States]

  • [7. High-Dimensional Continuous Control Using Generalized Advantage Estimation]

  • [8. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization]

  • [9. End-to-End Training of Deep Visuomotor Policies]

  • [10. DeepMPC: Learning Deep Latent Features for Model Predictive Control]

  • [11. Deep Visual Foresight for Planning Robot Motion]

  • [12. Deep Reinforcement Learning for Robotic Manipulation]

  • [13. Continuous Deep Q-Learning with Model-based Acceleration]

  • [14. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search]

  • [15. Asynchronous Methods for Deep Reinforcement Learning]

  • [16. Learning Continuous Control Policies by Stochastic Value Gradients]

機器翻譯:

  • [1. Simultaneous Machine Translation using Deep Reinforcement Learning]

目標定位:

  • [1. Active Object Localization with Deep Reinforcement Learning]

目標驅動之視覺導航:

  • [1. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning]

自動調控參數:

  • [1. Using Deep Q-Learning to Control Optimization Hyperparameters]

人機對話:

  • [1. Deep Reinforcement Learning for Dialogue Generation]

  • [2. SimpleDS: A Simple Deep Reinforcement Learning Dialogue System]

  • [3. Strategic Dialogue Management via Deep Reinforcement Learning]

  • [4. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning]

視頻檢測:

  • [1. Action-Conditional Video Prediction using Deep Networks in Atari Games]

文本至語音:

  • [1. WaveNet: A Generative Model for Raw Audio]

文本生成:

  • [1. Generating Text with Deep Reinforcement Learning]

文本遊戲:

  • [1. Language Understanding for Text-based Games Using Deep Reinforcement Learning]

DRL加收斂:

  • [1. Deep Reinforcement Learning for Accelerating the Convergence Rate]

利用DRL來設計神經網路:

  • [1. Designing Neural Network Architectures using Reinforcement Learning]

  • [2. Tuning Recurrent Neural Networks with Reinforcement Learning]

  • [3. Neural Architecture Search with Reinforcement Learning]

控制交通信號:

  • [1. Using a Deep Reinforcement Learning Agent for Traffic Signal Control]

自動駕駛:

  • [1. CARMA: A Deep Reinforcement Learning Approach to Autonomous Driving]

  • [2. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control]

  • [3. Deep Reinforcement Learning framework for Autonomous Driving]

二十一:其它方向

避免危險狀態:

  • [1. Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear]

DRL中On-Policy vs. Off-Policy 比較:

  • [1. On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning]

About

awesome deep learning papers for reinforcement learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published