Step 3 of 9
Collecting Your Data
The specific datasets, where to download them, and what format to expect
Start with sunspot data
This is the step where you actually sit down and build the datasets. I'll give you the exact file names and what to expect from each one. No point being vague about it.
Go to https://www.sidc.be/SILSO/datafiles and download SN_d_tot_V2.0. It's listed under "Sunspot Number" as the daily total international sunspot number.
The file is space-separated plain text. Seven columns:
- Year (integer)
- Month (integer)
- Day (integer)
- Fractional year (decimal)
- Daily total sunspot number (integer, this is the one you want)
- Daily standard deviation (decimal)
- Number of observatories contributing (integer)
- Definitive/provisional indicator (1 = definitive, 0 = provisional)
Column 5 is your data. A value of -1 means no observation that day. Don't interpolate those values, just leave them blank in your spreadsheet and note them. They tend to cluster in the early 19th century, less so after about 1870.
Download the full file. It starts in 1818 adn runs to present. That's 207 years. You'll actively use maybe 60 to 100 years for pattern matching, but having the full series lets you verify the cycle positions we documented in Step 1. Load it into a spreadsheet, then add a column for 12 month centred moving average. That removes the month-to-month noise and shows the actual cycle shape clearly enough to count peaks and troughs.
Planetary positions
Go to NASA JPL Horizons at https://ssd.jpl.nasa.gov/horizons/. You need the ecliptic longitude (in degrees) for Jupiter, Saturn, Uranus, Neptune, and the Sun, at monthly intervals.
Settings to use
- Ephemeris type: Vectors (gives geocentric position in ecliptic coordinates)
- Target body: select each planet individually. Search by name.
- Observer location: set to your location or leave as geocentric (the difference is negligible for outer planets)
- Time specification: start date, stop date, step size "1 MO" (one month)
- Table settings: request ecliptic longitude (lambda) in degrees
Run a separate query for each planet. Download the output as plain text. The longitude values are what you paste into your planetary position tab.
What date range to cover
For the current forecast and the next 12 months: get positions from Jan 2025 to Dec 2027. That gives you context and room ahead.
For pattern matching: you want positions going back at least 60 years (to match the 60 year combined cycle). Ideally 120 years. Horizons will give you dates as far back as you need. A 120 year monthly series is about 1,440 rows per planet. Not huge.
The Sun's position
Also get the Sun's ecliptic longitude at monthly intervals. The cardinal points matter: the Sun at 0 degrees marks the vernal equinox (around 20 September in Australia's astronomical spring), at 90 degrees marks the summer solstice, at 180 degrees marks the autumnal equinox, at 270 degrees marks the winter solstice. These define the quarter boundaries in the forecast grid.
Mercury
Get Mercury's ecliptic longitude at weekly intervals for the current year and the year you're forecasting. Mercury moves fast, about 1 degree per day, so monthly resolution loses too much information. Its zodiacal sign at any given week defines the air movement period in the Jones framework. At weekly resolution you'll have about 52 rows per year, which is manageable.
Moon phases
Go to the USNO moon phase tables at https://aa.usno.navy.mil/data/MoonPhases. Select your year and set the timezone to UTC+10 (Australia/Brisbane, AEST).
You need four events per month: New Moon, First Quarter, Full Moon, Last Quarter. Download the dates and times for the year you're forecasting, plus the preceding year (for the January of your target year, the previous December's events affect the early-month moisture pattern).
Record the times in AEST, not UTC. A Full Moon at 22:30 UTC is 08:30 the following morning in AEST. That's a different calendar date. Getting this wrong shifts your moisture period boundaries by a full day, which matters when you're working with 7 day to 14 day moisture windows. I've made this error. It's eventually obvious when the pattern doesn't line up with the observed rainfall, but fixing it mid-cycle is annoying.
Historical weather for your location
Go to the BoM climate data portal at http://www.bom.gov.au/climate/data/. Select Daily Data, then your nearest long run station.
For Dayboro
Start with station 040056 (Dayboro). Some elements have records going back to the 1890s. Daily maximum temperature starts in the 1890s. Daily rainfall coverage is good from about 1910 onwards.
Download: daily minimum temperature, daily maximum temperature, daily rainfall. BoM provides these as separate CSV files. Save all of them. You'll join them by date in your spreadsheet.
For other locations
Find your nearest long run station using the station selector on the BoM data portal. Prefer stations with records going back at least to 1950, adn ideally 1900 or earlier. Stations that moved significantly or changed their instrument setup will have a step change in the record, which the ACORN-SAT homogenisation process tries to correct. For temperature, use ACORN-SAT data where you can. For rainfall, use the raw station data.
Severe weather events to flag
In your spreadsheet, add a column for "notable event" and flag these years at minimum:
- 1893: major southeast Queensland floods
- 1902: federation drought peak
- 1950: widespread Queensland flooding
- 1974: Brisbane floods (Dayboro very wet that wet season)
- 1982: severe drought across Queensland
- 2011: Lockyer Valley and Brisbane valley floods, 47mm in one day at Dayboro on 10 January
- 2019: severe drought conditions in southeast Queensland
- 2022: February and March flooding, widespread across southeast Queensland
These are the years that do the most work when you're selecting and weighting analogues. If your current cycle configuration matches a year that produced one of these events, that matters.
Checking your data quality
After downloading everything, before you do any analysis, run these checks:
- Missing values: count them. A gap of more than a week in temperature or rainfall during a significant event is a problem. BoM flags some missing values as 999.9 or similar, depending on the download format. Find and label them before the spreadsheet processes them as real readings.
- Outlier values: a maximum temperature of 61°C or a daily rainfall of 2,000mm is an error, not a record. These get through occasionally. Scan the extremes.
- Station changes: if the station moved, there's eventually a step change visible in the long term record. Temperature goes up or down by a consistent amount after a certain year. BoM's station history files document moves. Check them if you see something odd.
- Sunspot value of -1: as noted above, these are missing observations, not zero counts. Don't average them in. Treat them as gaps.
Solar cycle prediction data
NOAA's Space Weather Prediction Center publishes an updated solar cycle prediction monthly at https://www.swpc.noaa.gov/products/solar-cycle-progression.
The current cycle, Solar Cycle 25, has tracked significantly above the consensus prediction. The original forecast from 2019 called for a relatively weak cycle. It has instead been one of the stronger cycles in recent decades. The revised peak estimates have been issued several times. The current (2026) picture is that the cycle peaked around late 2024 to early 2025 and is now in the declining phase.
For the forecast model, the phase matters more than the absolute number. Ascending phase, near peak, descending phase, near minimum. Each has a different set of historical analogues in the Queensland record. We're in the descending phase now, and that plus the other cycle positions all feed into the Step 6 analogue selection.
Download the monthly values from the SWPC dataset. They provide a downloadable CSV of the observed smoothed sunspot numbers alongside the official prediction. Add this to your sunspot tab alongside the SILSO daily data. They track closely; any difference is a useful cross check.