Is reinforcement learning a special case of EWA?
 While econometrically EWA clearly nests one particular variant of reinforcement learning (new and improved reinforcement learning models are not and cannot be nested in EWA) and other models, it does so by having lots of parameters. Lots of parameters (in a manner that encompasses other models) inevitably improves the fit. It cetainly cannot make it worse in likelihood (as long as you are trying to fit from period t to t+1). However, it also interferes with generality and prediction of new games or even future periods or new subjects in the same game (unless one has reached the flat part of the learning). This is well known about all models (not just in economics or learning, does Occam's razor ring a bell?). The more parameters you add the better the fit but the worse the parsimony. In my own experience, when I want to predict a new game based on findings from another game, reinforcement learning (even the nested one in EWA) does a better job than the model that encompasses it as a "special case".  A more parsimonous version of EWA is EWA Lite-- which unfortunately never caught on. This model has been shown to predict in more general settings. 
 
In addition, EWA has this imagination parameter, the delta, which I call "weight on foregone" which is powerful and yet problematic. The reinforcement on the unplayed action is 0 and so the  reinforcement learning component of EWA puts high probability on the action just played. Hence, when the delta parameter is zero, the model basically makes a super-strong prediction that the individual will likely play in period t+1 whatever he did in period t. Hence, when one has any mispecificiation regarding heterogeneity (or any other deviation of reality from the model), the delta will jump in to fix the problem by making itself close to zero and hence imposing inertia or state-dependence which essentially takes care of the hetergeneity by adding something like an individual effect (not a fixed effect or random effect but rather state-dependent individual effect). In all fairness, this problem came straight out of the reinforcement learning variant it nests (which was never meant for a t+1 imlpementation). But this problem was overcome in future generations of reinforcement learning models and EWA is still stuck with it. This is not at all a problem, however, if you don't interpret delta as a the weight on "belief learning" vs "reinforcement learning." If you interpret delta as an inertia parameter, you will find life to be happier and more fulfilling.
 
Nat Wilcox's Econometrica paper actually identified exactly this feature I just mentioned, but in a nicer and more elegant way, where he showed that more uncaptured heterogeneity results in delta estimate closer to zero. We showed a similar effect in our papers where we looked at many results from past works in light of this feature and tried to see to what extent this is responsible for differences of opinion in the literature.