Note: there are a couple of earlier articles that I didn't tag as "haskell" so they didn't appear in Planet Haskell. They don't contain any Haskell code, but they cover some background material that's useful to know (#3 talks about reanalysis data and what is, and #4 displays some of the characteristics of the data we're going to be using). If you find terms here that are unfamiliar, they might be explained in one of these earlier articles.
The code for this post is available in a Gist.
Update: I missed a bit out of the pre-processing calculation here first time round. I've updated this post to reflect this now. Specifically, I forgot to do the running mean smoothing of the mean annual cycle in the anomaly calculation -- doesn't make much difference to the final results, but it's worth doing just for the data manipulation practice...
Before we can get into the "main analysis", we need to do some pre-processing of the data. In particular, we are interested in large-scale spatial structures, so we want to subsample the data spatially. We are also going to look only at the Northern Hemisphere winter, so we need to extract temporal subsets for each winter season. (The reason for this is that winter is the season where we see the most interesting changes between persistent flow regimes. And we look at the Northern Hemisphere because it's where more people live, so it's more familiar to more people.) Finally, we want to look at variability about the seasonal cycle, so we are going to calculate "anomalies" around the seasonal cycle.
We'll do the spatial and temporal subsetting as one pre-processing step and then do the anomaly calculation seperately, just for simplicity.