Stebėti
Philip Thomas
Pavadinimas
Cituota
Cituota
Metai
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill
International Conference on Machine Learning, 2139-2148, 2016
7242016
Value function approximation in reinforcement learning using the Fourier basis
G Konidaris, S Osentoski, P Thomas
Proceedings of the AAAI conference on artificial intelligence 25 (1), 380-385, 2011
5462011
High-confidence off-policy evaluation
P Thomas, G Theocharous, M Ghavamzadeh
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
3172015
High confidence policy improvement
P Thomas, G Theocharous, M Ghavamzadeh
International Conference on Machine Learning, 2380-2388, 2015
2202015
Ad recommendation systems for life-time value optimization
G Theocharous, PS Thomas, M Ghavamzadeh
Proceedings of the 24th international conference on world wide web, 1305-1310, 2015
1992015
Preventing undesirable behavior of intelligent machines
P Thomas, B Castro da Silva, A Barto, S Giguere, Y Brun, E Brunskill
Science 366 (6468), 999-1004, 2019
1962019
Learning action representations for reinforcement learning
Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas
International conference on machine learning, 941-950, 2019
1882019
Increasing the action gap: New operators for reinforcement learning
MG Bellemare, G Ostrovski, A Guez, P Thomas, R Munos
Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016
1702016
Bias in natural actor-critic algorithms
P Thomas
International conference on machine learning, 441-448, 2014
1582014
Safe reinforcement learning
PS Thomas
1192015
Optimizing for the future in non-stationary mdps
Y Chandak, G Theocharous, S Shankar, M White, S Mahadevan, ...
International Conference on Machine Learning, 1414-1425, 2020
712020
Is the policy gradient a gradient?
C Nota, PS Thomas
arXiv preprint arXiv:1906.07073, 2019
702019
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces
S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ...
arXiv preprint arXiv:1405.6757, 2014
692014
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards
KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch
IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017
672017
Evaluating the performance of reinforcement learning algorithms
S Jordan, Y Chandak, D Cohen, M Zhang, P Thomas
International Conference on Machine Learning, 4962-4973, 2020
662020
Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing
P Thomas, G Theocharous, M Ghavamzadeh, I Durugkar, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 31 (2), 4740-4745, 2017
642017
Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines
PS Thomas, E Brunskill
arXiv preprint arXiv:1706.06643, 2017
622017
Importance Sampling for Fair Policy Selection.
S Doroudi, PS Thomas, E Brunskill
Grantee Submission, 2017
582017
Risk Quantification for Policy Deployment
PS Thomas, G Theocharous, M Ghavamzadeh
US Patent App. 14/552,047, 2016
582016
Offline contextual bandits with high probability fairness guarantees
B Metevier, S Giguere, S Brockman, A Kobren, Y Brun, E Brunskill, ...
Advances in neural information processing systems 32, 2019
552019
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–20