Reinforcement Learning
Roth
and Erev (1995) demonstrated that simple reinforcement learning models
capture the relationship between human behavior and dynamic paths in extensive-form games.
The
basic model considered by Roth
and Erev assumes that the propensity to select an alternative is a
discounted sum of reinforcements obtained from previous selections of
this alternative. Under this model the players ignore information
concerning the behavior of their opponent and/or or concerning the
“forgone payoffs”
(payoff that they could obtained had they chose different strategies).
Erev and Roth (1998) extend this analysis in a study of twelve constant
sum games and two learning models: Roth and Erev’s reinforcement
learning, and a variant of this model that allows sensitivity to
forgone payoffs. The most important contribution of their analysis is
the demonstration of the value of using learning models under the
assumption of general parameters over games.
Properties one should seek in a reinforcement learning model include:
1.
Capture the effect of incentives on dynamics
2. Allow for parsimony and generality
3. Converge to equilibrium at time equals infinity
4. Capture the intermediate term based on game incentives
Things to watch out for:
1. Consider using simulation analysis if that's what the
models are intended for (you can generally tell from the original
article).
2. If you insist on period-by-period analysis
with reinforcement learning models, use a recent variant without inertia effects.
3. Keep up with new models. The field is rapidly evolving. RELACS is one model which has been promoted recently.
4. No, reinforcement learning is not already a special case of EWA.
5. Watch out for multiple equilibria and other distribution
concerns. If you are using simulation analysis (and you should)
you are probably fitting average paths. This could be uninformative if
you are interested in the split between paths. Devise your own measure
for
fitting if this is what you are interested in, or email me for
additional advice.