There is always a tradeoff in statistical work between using only the most recent data to capture trends, and using a longer time period to get more statistical significance. Now, in principle, since WASP is calibrated to a par score set by the broadcast commentators, any trend in scoring that has occurred within the period of the data used to estimate the model could be adjusted for in the par score. The setting of a par score is both a strength and weakness of WASP. The strength is that it allows game-specific information to be factored into the projections such as using local knowledge to assess how the pitch is likely to play. The weakness, however, is that the commentators might suffer from the common human biases of seeing patterns in essentially random data, and I wonder if the view that batting power is increasing is an example of that.
So I was interested to see if John's perception of a recent increase in scoring rates due to teams having more "lower-order hitters", better bats, etc. is borne out in the data. There is no doubt that there has been an increase in scoring over time. For example, all of the 16 ODI matches (all involving top-8 countries) where the team batting second has scored 330 or more have occurred this century. Only 5 of those 16, however, occurred this decade, suggesting that maybe the changes are not so recent.
Extreme scores like these are not necessarily indicative of a general trend, so some regression analysis is called for. John's hypothesis seems to be mainly based on increased rates of scoring by lower-order power hitters near the end of the innings. I don't have the full ball-by-ball database to hand, just a record of scores and results, but if the theory is correct, it should show up in total scores. Now WASP is currently based on ODI data from 2006 involving the top-8 teams, so I had a look at all non-rain-shortened games involving those teams from May 1 2006, using a dummy variable for each year starting May 1. First, I looked at the evolution of first innings scores over that time. To control for different abilities across countries, I ran an OLS regression of first-innings score on dummy variables for the team batting first and for the team bowling first, as well as a dummy variable for each of the 8 years in the database. To further control for differences across grounds, I restricted the data set to games played at grounds where there were at least 10 matches played in this period, and included a dummy variable for each ground. This left me with 245 games. The results are shown in by the blue line in the graph below, with the line showing (left axis) the average first innings score for the average team against the average team at the average ground. There clearly has been very little change over these 8 years.
John's blog post, however, seemed to refer specifically to the ability of teams to chase down large scores, so I separately looked at whether there has been a change in the the probability of the team batting second winning using a probit regression. Because differences in grounds largely affect ease of scoring in both innings, and because probabilistic models require more data to get precise estimates, I used the full dataset without dummy variables for the ground, but again controlled for team ability and included dummy variables for each year. The results are shown in red on the same graph (right axis). Probabilistic models typically require a lot more data, and so I wouldn't put too much faith in the estimates for any one year. But there doesn't seem to be a clear recent trend to it being easier to chase down scores than in previous years, although there was a strange dip in the period 2007-2009 that has since been reversed.
Data (even historical data that may become out of date) is a good antitdote to these perception biases.