強化學習是目前熱門的研究方向。對不同強化學習的方法與paper進行分類有助于我們進一步了解針對不同的應用場景,如何使用合適的強化學習方法。本文將對強化學習進行分類并列出對應的paper。
5. Memory系列
算法名稱:MFEC
論文標題:Model-Free Episodic Control
發(fā)表會議:Arxiv
論文鏈接:https://arxiv.org/abs/1606.04460
當前谷歌學術引用次數(shù):138
算法名稱:NEC
論文標題:Neural Episodic Control
發(fā)表會議:ICML, 2017
論文鏈接:https://arxiv.org/abs/1703.01988
當前谷歌學術引用次數(shù):171
算法名稱:Neural Map
論文標題:Neural Map: Structured Memory for Deep Reinforcement Learning
發(fā)表會議:ICLR, 2018
論文鏈接:https://arxiv.org/abs/1702.08360
當前谷歌學術引用次數(shù):173
算法名稱:MERLIN
論文標題:Unsupervised Predictive Memory in a Goal-Directed Agent
發(fā)表會議:Arxiv
論文鏈接:https://arxiv.org/abs/1803.10760
當前谷歌學術引用次數(shù):108
算法名稱:RMC
論文標題:Relational Recurrent Neural Networks
發(fā)表會議:ICLR, 2018
論文鏈接:https://arxiv.org/abs/1806.01822
當前谷歌學術引用次數(shù):121
6. Model-Based RL系列
a. Model is Learned
算法名稱:I2A
論文標題:Imagination-Augmented Agents for Deep Reinforcement Learning
發(fā)表會議:NIPS, 2017
論文鏈接:https://arxiv.org/abs/1707.06203
當前谷歌學術引用次數(shù):182
算法名稱:MBMF
論文標題:Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
發(fā)表會議:ICRA, 2018
論文鏈接:https://arxiv.org/abs/1708.02596
當前谷歌學術引用次數(shù):503
算法名稱:MVE
論文標題:Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning
發(fā)表會議:Arxiv
論文鏈接:https://arxiv.org/abs/1803.00101
當前谷歌學術引用次數(shù):109
算法名稱:STEVE
論文標題:Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
發(fā)表會議:NIPS, 2018
論文鏈接:https://arxiv.org/abs/1807.01675
當前谷歌學術引用次數(shù):127
算法名稱:ME-TRPO
論文標題:Model-Ensemble Trust-Region Policy Optimization
發(fā)表會議:ICLR, 2018
論文鏈接:https://openreview.net/forum?id=SJJinbWRZ¬eId=SJJinbWRZ
當前谷歌學術引用次數(shù):195
算法名稱:MB-MPO
論文標題:Model-Based Reinforcement Learning via Meta-Policy Optimization
發(fā)表會議:Conference on Robot Learning, 2018
論文鏈接:https://arxiv.org/abs/1809.05214
當前谷歌學術引用次數(shù):108
算法名稱:MB-MPO
論文標題:Recurrent World Models Facilitate Policy Evolution
發(fā)表會議:NIPS, 2018
論文鏈接:https://arxiv.org/abs/1809.01999
當前谷歌學術引用次數(shù):316
b. Model is Learned
算法名稱:AlphaZero
論文標題:Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
發(fā)表會議:Science, 2018
論文鏈接:https://arxiv.org/abs/1712.01815
當前谷歌學術引用次數(shù):971
算法名稱:ExIt
論文標題:Thinking Fast and Slow with Deep Learning and Tree Search
發(fā)表會議:NIPS, 2017
論文鏈接:https://arxiv.org/abs/1705.08439
當前谷歌學術引用次數(shù):174