Modelling the behaviour and evolution of the physical phenomena which surround us remains a major challenge to the data science community. Modern enhancements in data acquisition, storage, processing and transmission, highlight the need for more accurate and reliable tools, techniques and skills for extracting knowledge from the available and highly dynamic large volumes of data.
Typically, modelling of natural phenomena rely on the deployment of mathematical models quite often built on the foundations of stringent assumptions. In many applications some of the underlying assumptions are violated and the models fail to yield closed form or unique solutions. We propose a generic approach to modelling sunspots numbers using integrated adaptive unsupervised and supervised models.
We adopt the data’s natural Gaussian distributional properties and use the early patterns as the basis for unsupervised and supervised modelling. Comparing multiple early patterns for each recorded cycle extracted at different time periods to the corresponding full cycles reveals that the first 3 years provide a sufficient basis for predicting the cycle’s peak.
Based on multiple simulations we develop a binary cut-off point of low and high solar activity which we use to label the data and apply Support Vector Machines (SVM) for predicting new cycles. Repeated SVM runs using repeatedly improved data parameters show that the approach yields greater accuracy and reliability than conventional approaches mainly because it simultaneously traces anomalies and provides a robust basis for model selection.
Finally, we describe how the method can be adapted to other unsupervised and supervised methods with different applications.