Sun Spot Cycle 24
Preamble
On the NASA website, which was updated on 1 March 2013, it is reported that the current sun spot cycle seems to have already peaked at with the sun spot count of 67 in February 2012 (last year):-NASA sun spot prediction
The sun spot cycle 24 is not behaving as predicted a couple of years ago. For the past few years the size (or sun spot count) of the sun spot cycle 24 had been constantly downgraded: -
Older NASA sun spot predictions
From the earlier predictions made by the consensus group of scientists (working under the auspices of NASA?) we can conclude that there were two issues that did not work out as forecasted:
- The size (or the peak) of the sun spot count.
- The date on which sun spot cycle peaked.
Now whereas I most definitely do not feel qualified or even energetic enough to comment on the first issue, I want to focus on the second issue, as I believe that the wrong statistics were used to calculate the previous sun spot cycle peaks with a consequence that the forecast for this cycle was also wrong, because it started on the wrong footing.
For a good primer about why the sun spot cycle is not as high as predicted I can suggest the interest reader to study the book:
'Grand Phases on the Sun' written by Steven Haywood Yaskell ISBN 978-1-4669-6300-9
Moving average and the sun spot count
The current and past sun spot cycle counts are readily available on the web and hereto is a link to an official website in Belgium from where the interested user can download his own set of data to analyse: -
If you are not interested in data manipulation exercises, you can download graphs of the various sun spot cycles, ready prepared from the sites that host the raw sun spot data.
On the official graphs prepared by the various governmental agencies they state that the graphs have been prepared with a moving average. A standard of one year and one month seems to be the norm.
How do you calculate a moving average?
This seems like a silly question as just about everybody knows what a moving average is and how to calculate it; the method is used extensively in everyday business by accountants, people responsible for budgets, and guys in charge of production processes. The fact that it is used in the field of astrophysics should therefore be comforting, or should it?
To illustrate the workings of a moving average calculation I prepared the following table. Feel free to skip the following discussion-
![]() |
| Table 1 - Moving Average calculation illustration |
Using a very simple data series, named Series as a the first column on table 1, the working of a moving average calculation is shown. For this example I used a span of 3, 5 and 10 cells. The cells represent days, weeks, months, years, or whatever series of data you work with.
In the second column coloured blue, the results of a 3 period or span moving average is shown. The arrows shows the values that were used to calculate the value in the given cell. In the example I show that cells 5, 6 and 7 were added together and divided by 3, giving the answer 6. Once the formula is entered in one cell it is very easy to copy the same formula to all the cell in the series under investigation.
The third column coloured in bright green shows the results of the moving average for 5 periods, and the last column coloured in orange shows the results of the 10 period moving average calculation.
Strictly speaking, if you are calculating the moving average with a 3 period span you cannot calculate the answer for the first 2 cells in the series, when calculating a 5 period moving average the first 4 cells contains no answer, and for 10 period moving average, the first 9 cells returns a blank answer.
I used the information shown on Table 1 to prepare graph 1.
Notice the different starting positions for the moving average lines generated by the varying moving average periods.
![]() |
| Graph 1 - Moving Average |
Simple local regression, using adjacent neighbours to calculate the estimated value of the cell
![]() |
| Table 2 - Simple local regression |
Notice how the answer of the simple equal weighted local regression can keep pace with the ever increasing value of the series. This is quite in contrast with the answer obtained with a moving average calculation.
With equal weighting local regression calculation we find that the result lags the series at the start and end of the results column. The lag is dependent on the span selected to do the calculation.
Graph 2 is a representation of the information on Table 2. The lag at the start and end of the calculated series is the reason the straight line appears to have magic coloured handles. It is not a fairy or wizards wand, just the result of the lag that you get when using the local regression calculation.
![]() |
| Graph 2 - Local Regression |
(I do know that purists do not consider the Excel Random function as random enough and wouldn't dream of using it in a serious paper. This, however, is just a blog and for illustrative purposes the Excel Random function suffice.)
![]() |
| Graph 3 Moving Average with wavy data series |
On Graph 3 the series to be modelled is shown with red diamonds. The moving average calculations with a span of 3, 5 and 10 periods are plotted in solid lines. On inspecting this graph we can draw two conclusions. We can confirm that the moving average calculation cannot reach to the same peak as the series being modelled, and the greater the span, the greater the shortfall in reaching the peak. And secondly, there is an offset between the peak modelled by the moving average calculation and the actual peak of the series being modelled. The greater the span of the moving average calculation, the greater the lag before the moving average shows either the peak or valley.
The moving average calculation is backward looking and only past value are used to calculate the present cell values and as a consequence there is a lag in showing when the peak or valley of a series is actually reached. It is a major short coming of the moving average calculation when trying to analyse any wavy data, and is one of there reasons why I went to such great lengths to show this problem as clearly as possible.
There must be a better way to analyse a series that contains information that are wavy, undulating and repeating. A basic local regression fits the bill. See Graph 4. However, just like the moving average calculation the local regression cannot reach the same peaks and valley values, though the difference is far less striking than that of the moving average calculation.
![]() |
| Graph 4 - Local regression and wavy data series |
![]() |
| Table 3 - Local regression with weighing factors for distance |
I assume that the nearest neighbours have a greater weighting than the values further away from the estimation.
Table 4 is a close-up of the smaller weighing factor listing shown of Table 3.
Now lets look at the detail on Table 4. The weighing factors add up to the span considered. For the purposes of constructing Table 4 I used arbitrary numbers. The following rules applied: the weighing factors must add up to the span, and the values must decrease from the estimation towards the flanks. Because I used arbitrary numbers I show the same centre value of 2 for the 5 and 7 span local regression. A graphical representation of the weighing factors used in Table 4 is given in Graph 5. Just having the discipline of creating a tapering sequence of numbers creates a bell-shaped curve. Bell-shaped curve, that rings a bell! Lets investigate further as this exercise seems to be leading somewhere!
![]() |
| Table 4 - Weighing Factors before normalizing |
![]() |
| Graph 5 - Weighing factors before normalizing |
By amusing a weighting factor of 50%, implying that values double the distance from the estimation point have half the influence on the estimation point than the immediate neighbours, Table 5 can be prepared. Table 5 is showing normalised values with the totals of the factors adding to 1. Using the Solver function in Excel, it is very easy to find the normalised factors once a table with weighing factors is constructed. In the case of Table 5 the values on the side of the central column is calculated by using a 50% value to its immediate neighbour.
![]() |
| Table 5 - Weighted mean - half distance - normalized |
![]() |
| Graph 6 - Weighing factors after normalizing |
The weighing factors calculated according to strict recipe: all factors added together equals 1; the influence of the values decreases by 50% for each step away from the estimation point.
Graph 6 shows the weighing factors for spans of 3 5 and 7 cells. Notice how the normalising changed the value of the central point. And how the shapes of the weighing factors curves are resembling a bell-curve even better than previously.
In Table 6 the results of the local regression using a normalised weighing factor table is shown. As expected, the peaks and valleys are neatly followed, yet the highs and lows of the undulating series are not match.
A cynic would conclude that effects of local regression is the same as that as Socialism, it steals from the rich (highs) and gives it to the poor (lows).
![]() |
| Table 6 - Local regression with normalized factors |
To here I used very basic and straight forward arithmetic to demonstrate how a moving average is calculated, the short comings of using that technique, the advantages of using local regression with a very simple illustration of why it is far superior to the classical method. I also reached the end of what can be achieved by simple and logical arithmetic.
Enter the mathematicians
Whilst still working for Bell Laboratories William Cleveland, who has been working on the presentation of data to get information wrote a ground-breaking paper "Robust Locally Weighted Regression and Smoothing Scatterplots", that was published in 1979 in the Journal of American Statistical Association, Vol. 74, pp. 829-836. (This was just a few years after Cleveland published papers on Trellis Plots, a way to represent graphical information that is now in common use.) The LOESS paper by Cleveland became one of the most quoted statistical papers of the late 20th century.
The work of Professor Cleveland is also mentioned on Wikipedia, find it yourself.
Enter Jon Peltier, a software engineer that runs his own consulting business, who perused the paper of Cleveland and in 2009 wrote a Visual Basic application that can be used as an Add-in in Excel. Jon published full details on his blog and invited those interested to download the VB application and try it out. This I did.
Interlude
Allow me to indulge you in story that I think has very striking parallels.
In 1974 Lerch and Grossman, two mathematicians working for IBM wrote a paper explaining how graph theory can be used to solve an age old mining problem: pit optimisation. At the time the paper was considered a breakthrough, but given the amount of time required to do an LG pit optimization run on the computers of the day, the mining industry abandoned the method and adopted a far more cruder method, called Floating Cone. In the mid 1980s, whilst working for BHP in Australia, Dr Jeff Whittle, a British born mathematician (from Yorkshire) read the LG paper and realised that he could write a software programme that can run on the new workstations, which were the state-of-the-art small computers of the time. He left BHP and started his own consulting company that marketed the Whittle software products. The rest they say is history. In the mining industry today very few people refer to the Pit Optimization process as the LG logarithm; the universal term is Whittling!
Sunspot Cycle 19
I tested the Loess application prepared by Jon Peltier on the raw sun spot data I down loaded from the Royal Observatory of Belgium - the SIDC site: -
In Graph 7 I show the sun spot number count for cycle 19. Using the basic Excel built in software to calculate the moving average for 390 days gives a very similar answer to the graph published on the SIDC website with the credit: -
The graphs have been prepared by Jan Alvestad based on data from SIDC, Brussels.
![]() |
| Graph 7 - Local regression v moving average |
Now compare the LOESS 390 day plot with one year and one month moving average. A similar but displaced outline.
I went to great lengths to show and explain the short comings of the moving average calculation and here we can see the difference.
It is my intention to post more revised graphs and a discussions on the sun spot count in future.
FIN














No comments:
Post a Comment