Is reinforcement learning a special case of EWA?
While
econometrically EWA clearly nests one particular variant of
reinforcement learning (new and improved reinforcement learning models
are not and cannot be nested in EWA) and other models, it does so by
having lots of parameters. Lots of parameters (in a manner that
encompasses other models) inevitably improves the fit. It cetainly
cannot make it worse in likelihood (as long as you are trying to fit
from period t to t+1). However, it also interferes with generality
and prediction of new games or even future periods or new subjects
in the same game (unless one has reached the flat part of the
learning). This is well known about all models (not just in economics
or learning, does Occam's razor ring a bell?
). The
more parameters you add the better the fit but the worse
the parsimony. In my own experience, when I want to predict a new
game based on findings from another game, reinforcement learning (even
the nested one in EWA) does a better job than the model that
encompasses it as a "special case". A more
parsimonous version of EWA is EWA Lite-- which unfortunately never
caught on. This model has been shown to predict in more general
settings.
In
addition, EWA has this imagination parameter,
the delta, which I call "weight on foregone" which is
powerful and yet
problematic. The reinforcement on the unplayed action is 0 and so
the
reinforcement learning component of EWA puts high probability on
the action
just played. Hence, when the delta parameter is zero, the model
basically makes a super-strong prediction that the individual
will likely play in period t+1 whatever he did in period t. Hence,
when one
has any mispecificiation regarding heterogeneity (or any other
deviation of reality from the model), the delta will jump in to fix the
problem by making itself close to zero and hence imposing inertia or
state-dependence which essentially takes care of the hetergeneity by
adding something like an individual effect (not a fixed effect or
random effect but rather state-dependent individual effect). In
all fairness, this problem came straight out of the reinforcement
learning variant it nests (which was never meant for a t+1
imlpementation). But
this problem was overcome in future generations of reinforcement
learning models and EWA is still stuck with it. This is not at all a
problem, however, if you don't interpret delta as a the weight on
"belief learning" vs "reinforcement learning." If you interpret delta
as an inertia parameter, you will find life to be happier and more
fulfilling.
Nat Wilcox's
Econometrica paper actually identified exactly this feature I just mentioned, but in a nicer
and more elegant way, where he showed that more uncaptured heterogeneity results
in delta estimate closer to zero. We showed a similar effect in our papers where we looked at many
results from past works in light of this feature and tried to see to what extent
this is responsible for differences of opinion in the
literature.