Download PDFOpen PDF in browserThe impact of ensemble diversity on learning big data in dynamic environments14 pages•Published: October 25, 2019AbstractFor many classification tasks, data is collected over an extended period of time and the predictive model learns over time, adapting to changes in the underlying distribution of the data if necessary. To optimize generalization performance, margin distribution is considered to be an important factor. A major concern posed by nonstationary learning for any algorithm is the rate of adaptation to new concepts and the volume of the data. Tackling the problem of learning in nonstationary environments associated with drifting concepts with ensembles of classifiers makes the concept of diversity to be of paramount significance in optimizing the rate of adaptation to new concepts for classification tasks. In this paper, we investigate the impact of ensemble diversity on the rate of adaptation to new concepts in nonstationary learning. The rate of adaptation is analyzed by exploiting the correspondence that exists between voting margins and the double fault measure, a popular diversity measure strongly linked to the margin. We utilize the Adaptive Classifier Ensemble Boost algorithm (AceBoost) to generate diverse base classifiers and optimize margin distribution to exploit different amounts of diversity to generate an optimal ensemble capable of handling different kinds of drift. The experimental results confirm that AceBoost outperforms other state of the art algorithms that exploit ensemble diversity to handle concept drift.Keyphrases: concept drift, diversity, ensemble, margin, support vector machine In: Kennedy Njenga (editor). Proceedings of 4th International Conference on the Internet, Cyber Security and Information Systems 2019, vol 12, pages 227-240.
|