Enhancing Intrusion Detection System Evaluation: A Framework for Generating Comprehensive and Scalable Datasets
International Journal of Electronics and Communication Engineering |
© 2024 by SSRG - IJECE Journal |
Volume 11 Issue 10 |
Year of Publication : 2024 |
Authors : Faeiz M. Alserhani |
How to Cite?
Faeiz M. Alserhani, "Enhancing Intrusion Detection System Evaluation: A Framework for Generating Comprehensive and Scalable Datasets," SSRG International Journal of Electronics and Communication Engineering, vol. 11, no. 10, pp. 91-101, 2024. Crossref, https://doi.org/10.14445/23488549/IJECE-V11I10P107
Abstract:
Intrusion Detection Systems (IDS) evaluation relies principally on the quality and broadness of datasets, which commonly have limitations, including relative scarcity, inadequate coverage of real-world attacks, imbalanced data, and difficulty reproducing with specific requirements. The advancement of IDS algorithms has created a significant gap in the availability of comprehensive and scalable datasets. Developing holistic, well-documented, and trustworthy real-world data traffic is not a simple task; it requires a great deal of effort and high cost. To tackle these challenges, we have proposed a dataset generation framework to construct a reliable dataset based on real-world and synthetic traffic data aggregation, providing a diverse range of attack methods across multiple network settings. The collected traffic records are processed and normalized to produce a consistent dataset. A wide range of multi-step intrusion instances are injected to the constructed dataset to expand the attack coverage. Several tools have been implemented to perform the required data processing steps to automate class labeling and build ground truth data. The proposed framework allows for overcoming the limitations in IDS evaluation in real-world conditions by offering scalable, reproducible, and comprehensive datasets. An experimental dataset has been generated to evaluate different IDS systems such as Snort, Zeek, and machine learning models. The study concludes that the benchmark datasets are fundamental to advancement in IDS research and toward accurate IDS evaluation for safeguarding digital ecosystems against evolving threats.
Keywords:
Intrusion Detection Systems(IDS), Machine Learning(ML), Attack traffic, IDS evaluation, Benchmarking dataset.
References:
[1] Mossa Ghurab et al., “A Detailed Analysis of Benchmark Datasets for Network Intrusion Detection System,” Asian Journal of Research in Computer Science, vol. 7, no. 4, pp. 14-33, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Monowar H. Bhuyan, Dhruba K. Bhattacharyya, and Jugal K. Kalita, “Towards Generating Real-life Datasets for Network Intrusion Detection,” International Journal of Network Security, vol. 17, no. 6, pp. 683-701, 2015.
[Google Scholar] [Publisher Link]
[3] Yasir Hamid et al., “Benchmark Datasets for Network Intrusion Detection: A Review,” International Journal of Network Security, vol. 20, no. 4, pp. 645-654, 2018.
[Google Scholar] [Publisher Link]
[4] Sajal Bhatia, Sunny Behal, and Irfan Ahmed, Distributed Denial of Service Attacks and Defense Mechanisms: Current Landscape and Future Directions, Versatile Cybersecurity, Advances in Information Security, Springer, Cham, pp. 55-97, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Jorge Luis Guerra, Carlos Catania, and Eduardo Veas, “Datasets Are Not Enough: Challenges in Labeling Network Traffic,” Computers & Security, vol. 120, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[6] R. Vinayakumar et al., “Deep Learning Approach for the Intelligent Intrusion Detection System,” IEEE Access, vol. 7, pp. 41525-41550, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Ansam Khraisat et al., “Survey of Intrusion Detection Systems: Techniques, Datasets, and Challenges,” Cybersecurity, vol. 2, pp. 1-22, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Ankit Thakkar, and Ritika Lohiya, “A Review of the Advancement in Intrusion Detection Datasets,” Procedia Computer Science, vol. 167, pp. 636-645, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Nour Moustafa, and Jill Slay, “UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set),” 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, pp. 1-6, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Ananya Devarakonda et al., “Network Intrusion Detection: A Comparative Study of Four Classifiers Using the NSL-KDD and KDD'99 Datasets,” Journal of Physics: Conference Series, vol. 2161, pp. 1-11, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Suzan Hajj et al., “Anomaly‐Based Intrusion Detection Systems: The Requirements, Methods, Measurements, and Datasets,” Transactions on Emerging Telecommunications Technologies, vol. 32, no. 4, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Zhen Yang et al., “A Systematic Literature Review of Methods and Datasets for Anomaly-Based Network Intrusion Detection,” Computers & Security, vol. 116, pp. 1-20, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Imtiaz Ullah, Ayaz Ullah, and Mazhar Sajjad, “Towards a Hybrid Deep Learning Model for Anomalous Activity Detection in Internet of Things Networks,” IoT, vol. 2, no. 3, pp. 428-448, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Joffrey L. Leevy, and Taghi M. Khoshgoftaar, “A Survey and Analysis of Intrusion Detection Models Based on CSE-CIC-IDS2018 Big Data,” Journal of Big Data, vol. 7, pp. 1-19, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Wang Yan, Han Dezhi, and Cui Mingming, “Intrusion Detection Model of the Internet of Things Based on Deep Learning,” Computer Science and Information Systems, vol. 20, no. 4, pp. 1519-1540, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Steven M. Bellovin, Preetam K. Dutta, and Nathan Reitinger, “Privacy and Synthetic Datasets,” Stanford Technology Law Review, vol. 22, no. 1, 2019.
[Google Scholar] [Publisher Link]
[17] Brandon Williams, Xishuang Dong, and Lijun Qian, “Data-Driven Network Monitoring and Intrusion Detection Using Machine Learning,” 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Paris, France, pp. 1-7, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Patrick Russell et al., “On the Fence: Anomaly Detection in IoT Networks,” NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, Miami, FL, USA, pp. 1-4, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Mohamed Amine Daoud et al., “Convolutional Neural Network-Based High-Precision and Speed Detection System on CIDDS-001,” Data & Knowledge Engineering, vol. 144, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Cheng Fan et al., “A Critical Review on Data Preprocessing Techniques for Building Operational Data Analysis,” Proceedings of the 25th International Symposium on Advancement of Construction Management and Real Estate, pp. 205-217, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Kiran Maharana, Surajit Mondal, and Bhushankumar Nemade, “A Review: Data Preprocessing and Data Augmentation Techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91-99, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Anna C. Belkina et al., “Automated Optimized Parameters for T-Distributed Stochastic Neighbor Embedding Improve Visualization and Analysis of Large Datasets,” Nature Communications, vol. 10, pp. 1-12, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Ryan Mills et al., “Practical Intrusion Detection of Emerging Threats,” IEEE Transactions on Network and Service Management, vol. 19, no. 1, pp. 582-600, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Amandalynne Paullada et al., “Data and its (dis) Contents: A Survey of Dataset Development and Use in Machine Learning Research,” Patterns, vol. 2, no. 11, pp. 1-14. 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Geeta Singh, and Neelu Khare, “A Survey of Intrusion Detection from the Perspective of Intrusion Datasets and Machine Learning Techniques,” International Journal of Computers and Applications, vol. 44, no. 7, pp. 659-669, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Huseyin Ahmetoglu, and Resul Das, “A Comprehensive Review on Detection of Cyber-Attacks: Data Sets, Methods, Challenges, and Future Research Directions,” Internet of Things, vol. 20, 2022.
[CrossRef] [Google Scholar] [Publisher Link]