Optimization of Ensemble ML Models for Cybersecurity Analytics Using CICIDS 2017
DOI:
https://doi.org/10.5281/zenodo.17863306Keywords:
Intrusion Detection System (IDS), CICIDS2017 dataset, Machine Learning, Ensemble Learning, SMOTETomek, Cybersecurity AnalyticsAbstract
Intrusion detection is one of the most significant components of modern cybersecurity systems due to the increase in the complexity of cyberattacks. Conventional detection approaches can be inefficient in handling the scale, imbalance, and complexity of network traffic, resulting in lower accuracy and an increase in false positives. This paper offers a hybrid ensemble model for intrusion detection using the CICIDS 2017 dataset, which combines Random Forest (RF) and Light Gradient Boosting Machine (LGBM). The methodology includes intensive data preprocessing, such as stratified random sampling, elimination of irrelevant features, aggregation of attack classes, and balancing of the distributions of classes using the SMOTE Tomek technique. The most important predictors are chosen using Pearson correlation and mutual information. The hybrid ensemble model capitalizes on the strengths that RF and LGBM have in complement and thus it can generalize better, minimize variance, and be more resistant to noisy and skewed data. The experimental findings indicate that the proposed method successfully attains 99.90% accuracy (ACC), 99.80% precision (PRE), 99.92% recall (REC) and 99.90% F1-score (F1), and can be used in comparison with standard ML models, including SVM and Naive Bayes. The results confirm the efficiency of ensemble learning in practice of intrusion detection and thus provide a scalable, precise, and dependable method of cloud-based cybersecurity detection.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Dr Chintal Kumar Patel (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
