A problem found throughout the world of data is how to distinguish signal from noise. When dealing with data that comes in a sequence, such as time-series data (the most familiar example but by no means the only example), a frequent method of dealing with the problem is to apply a smoother. Smoothers can range from the very simple to very sophisticated, but we will concentrate on the simpler end of the spectrum.
[Related Article: How to Get Started with SQL]
All these smoothers, no matter how simple or how complicated have in common that they rely on the data appearing in a particular order. To calculate the smoother, you need to be able to refer to multiple rows around the row in focus, something SQL was not always well set up to do.
In more recent years, major implementations of SQL including MySQL, SQLite, PostgreSQL and SQL Server have added Lead() and Lag() functions allowing calculations to be performed on rows appearing before and after a row of interest. We can make use of these functions to implement simple smoothers in SQL, such as a weighted moving average.
Like all the windowing functions, SQL gives you flexibility on how to specify the data that the function applies to using the ‘PARTITION’ and ‘OVER’ key words. In semantic terms
SELECT Index, Classifier, Value, LAG(Value) OVER (Partition By Classifier ORDER BY Index) FROM YourTable
Note that the default offset is one place but you can specify the offset within the Lag function e.g. Lag(Value,2) is two rows behind the original row.
However, we can’t return multiple window function variables in the same column, so we can’t do the arithmetic we need to do to calculate the weighted moving average itself. Therefore we need to create a Common Table Expression with to provide the lagged columns, and run a subsequent query to calculate the weighted moving average itself. Again, in purely semantic terms:
WITH Base_Table AS (SELECT Index, Classifier, Value ,LAG(Value) OVER (Partition By Classifier ORDER BY Index) as LagVal1 ,LAG(Value,2) OVER (Partition By Classifier ORDER BY Index) as LagVal2FROM YourTable)SELECT Index, Classifier , (Value + LagVal1 + LagVal2)/3 as MovingAverage From Base_Table
However, to illustrate properly we need a good example. For better or worse, I’m an Australian from Victoria, so the football code I follow is Australian Rules (less likely to be true if I was an Australian from NSW or Queensland). My team is the Essendon Bombers who recently recorded one of the biggest losses of their history against the Western Bulldogs, only managing to score 33 points to the Western Bulldogs’ 137 points.
By way of self-flagellation, I thought it would instructive to calculate the moving average of the scores of both teams in recent matches in order to better see how each of them are trending.
Firstly, we need to create a table containing the scores from the last several matches.
CREATE TABLE AFL_SCORES( Team VarChar(30) , RoundNum Int , Score Int ) ;INSERT INTO AFL_SCORESVALUES ('Essendon', 14, 71) , ('Essendon', 15, 77) , ('Essendon', 16, 76) , ('Essendon', 17, 86) , ('Essendon', 18, 96) , ('Essendon', 19, 106) , ('Essendon', 20, 67) , ('Essendon', 21, 33) , ('Western Bulldogs', 14, 73) , ('Western Bulldogs', 15, 66) , ('Western Bulldogs', 16, 71) , ('Western Bulldogs', 17, 74) , ('Western Bulldogs', 18, 89) , ('Western Bulldogs', 19, 113) , ('Western Bulldogs', 20, 80) , ('Western Bulldogs', 21, 137);
Then we have the query itself, following the pattern introduced above of a CTE to create the lagged columns and a subsequently query to calculate the moving average itself. Note that this is the simple moving average, but in many cases a weighted moving average is used, with a variety of different weighting schemes to suit differing tastes.
WITH LagsTable AS (SELECT Team , RoundNum , Score , Lag(Score) OVER (Partition By Team Order By RoundNum) AS ScoreLag1 , Lag(Score,2) OVER (Partition By Team Order By RoundNum) AS ScoreLag2FROM AFL_SCORES)SELECT Team , RoundNum , (Score+ScoreLag1+ScoreLag2)/3 As MovingAverageFROM LagsTable
Giving the result:
Possibly a little bit of comfort for Essendon fans there in the sense that it appears the Western Bulldogs have been improving for several rounds in a row, a stronger trend than Essendon’s more recent collapse in scoring.
[Related Article: All The Cool Things You Can Do With PostgreSQL for Data Analysis]
In any case, a useful, if personally painful way to illustrate how SQL can be used to calculate a moving average with window functions.
Originally Posted Here