CNI-VIF: Enhanced Feature Selection for Graph Databases by Integrating Composite Node Information in VIF
International Journal of Electrical and Electronics Engineering |
© 2024 by SSRG - IJEEE Journal |
Volume 11 Issue 11 |
Year of Publication : 2024 |
Authors : Anagha Patil, Arti Deshpande |
How to Cite?
Anagha Patil, Arti Deshpande, "CNI-VIF: Enhanced Feature Selection for Graph Databases by Integrating Composite Node Information in VIF," SSRG International Journal of Electrical and Electronics Engineering, vol. 11, no. 11, pp. 100-113, 2024. Crossref, https://doi.org/10.14445/23488379/IJEEE-V11I11P111
Abstract:
Feature selection and dimensionality reduction are critical techniques in today's data-centric world, where vast and complex datasets necessitate efficient and effective methods for analysis and decision-making. In this research, an enhanced feature selection technique, Composite Node Information - Variance Inflation Factor (CNI-VIF), tailored for graph databases, which particularly focuses on network traffic datasets, is proposed. Traditional feature selection methods often fail to adequately capture the complex interrelationships in graph data. The proposed method incorporates Composite Node Information (CNI), an aggregate of Betweenness, Closeness, and Degree centrality, into the VIF framework to address these limitations. By integrating CNI, the proposed method not only improves the selection of graph-based features but also achieves dimensionality reduction and decreased computation time, making the feature selection process more efficient. Experiments conducted on CTU-13, IoT-23, and NCC-2 datasets demonstrate that CNI-VIF significantly outperforms traditional methods by effectively selecting graph-based features, thus enhancing the performance of machine learning models. Specifically, the Random Forest algorithm shows exceptional results among all feature selection techniques, with CNI-VIF yielding the best performance overall. The results indicate that CNI-VIF is particularly effective for graph databases, offering a robust and efficient feature selection mechanism that enhances model computation and predictive accuracy.
Keywords:
CNI, CNI-VIF, Graph database, Feature selection, VIF.
References:
[1] Bing Xue et al., “A Survey on Evolutionary Computation Approaches to Feature Selection,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 4, pp. 606-626, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Pablo A. Estevez et al., “Normalized Mutual Information Feature Selection,” IEEE Transactions on Neural Networks, vol. 20, no. 2, pp. 189-201, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Weikuan Jia et al., “Feature Dimensionality Reduction: A Review,” Complex & Intelligent Systems, vol. 8, pp. 2663-2693, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Xiaoping Li, Yadi Wang, and Rubén Ruiz, “A Survey on Sparse Learning Models for Feature Selection,” IEEE Transactions on Cybernetics, vol. 52, no. 3, pp. 1642-1660, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Guanglu Sun et al., “Feature Selection for IoT Based on Maximal Information Coefficient,” Future Generation Computer Systems, vol. 89, pp. 606-616, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Afnan Alharbi, and Khalid Alsubhi, “Botnet Detection Approach Using Graph-Based Machine Learning,” IEEE Access, vol. 9, pp. 99166-99180, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Giorgio Roffo et al., “Infinite Feature Selection: A Graph-Based Feature Filtering Approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 12, pp. 4396-4410, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Santiago Timón-Reina, Mariano Rincón, and Rafael Martínez-Tomás, “An Overview of Graph Databases and their Applications in the Biomedical Domain,” Database The Journal of Biological Databases and Curation, vol. 2021, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Gonzalo Cerruela-García, José Manuel Cuevas-Muñoz, and Nicolás García-Pedrajas, “Graph-Based Feature Selection Approach for Molecular Activity Prediction,” Journal of Chemical Information and Modeling, vol. 62, no. 7, pp. 1618-1632, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Ronghua Shang et al., “Sparse and Low-Redundant Subspace Learning-Based Dual-Graph Regularized Robust Feature Selection,” Knowledge-Based Systems, vol. 187, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Adnan Yazici, and Ezgi Taşkomaz, “BF-BigGraph: An Efficient Subgraph Isomorphism Approach Using Machine Learning for Big Graph Databases,” Information Systems, vol. 124, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[12] S. García et al., “An Empirical Comparison of Botnet Detection Methods,” Computers & Security, vol. 45, pp. 100-123, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Sebastian Garcia, Agustin Parmisano, and Maria Jose Erquiaga, A Labeled Dataset with Malicious and Benign IoT Network Traffic, Aposemat IoT-23, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[14] M. Aidiel Rachman Putra, Tohari Ahmad, and Dandy Pramana Hostiadi, NCC-2 Dataset: Simultaneous Botnet Dataset, Mendeley Data, Version 2, 2022.
[CrossRef] [Publisher Link]
[15] Zhou Zhao et al., “Graph Regularized Feature Selection with Data Reconstruction,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 3, pp. 689-700, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Haishuai Wang et al., “Incremental Subgraph Feature Selection for Graph Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 128-142, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Ronghua Shang et al., “Self-Representation Based Dual-Graph Regularized Feature Selection Clustering,” Neurocomputing, vol. 171, pp. 1242-1253, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Sina Tabakhi, Parham Moradi, and Fardin Akhlaghian, “An Unsupervised Feature Selection Algorithm Based on Ant Colony Optimization,” Engineering Applications of Artificial Intelligence, vol. 32, pp. 112-123, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Hojat Ghimatgar et al., “An Improved Feature Selection Algorithm Based on Graph Clustering and Ant Colony Optimization,” Knowledge-Based Systems, vol. 159, pp. 270-285, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Zhiwei Hu et al., “Feature Selection Based on Graph Structure,” Combinatorial Optimization and Applications, pp. 289-302, 2019.
[CrossRef] [Publisher Link]
[21] Thosini Bamunu Mudiyanselage, and Yanqing Zhang, “Feature Selection with Graph Mining Technology,” Big Data Mining and Analytics, vol. 2, no. 2, pp. 73-82, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Mi Wang et al., “Graph-Kernel Based Structured Feature Selection for Brain Disease Classification Using Functional Connectivity Networks,” IEEE Access, vol. 7, pp. 35001-35011, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Fan Cheng et al., “Graph-Based Feature Selection in Classification: Structure and Node Dynamic Mechanisms,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 4, pp. 1314-1328, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Jiehong Cheng et al., “A Variable Selection Method Based on Mutual Information and Variance Inflation Factor,” Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, vol. 268, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Ling Zheng et al., “Feature Grouping and Selection: A Graph-Based Approach,” Information Sciences, vol. 546, pp. 1256-1272, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Consolata Gakii, Paul O. Mireji, and Richard Rimiru, “Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets,” Algorithms, vol. 15, no. 1, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Robert M. O’brien, “A Caution Regarding Rules of Thumb for Variance Inflation Factors,” Quality & Quantity, vol. 41, pp. 673-690, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Anagha280883/CNI-VIF, 2024. [Online]. Available: https://github.com/Anagha280883/CNI-VIF/tree/main
[29] Dandy Pramana Hostiadi, and Tohari Ahmad, “Dataset for Botnet Group Activity with Adaptive Generator,” Data in Brief, vol. 38, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Ian T. Jolliffe, and Jorge Cadima, “Principal Component Analysis: A Review and Recent Developments,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Arif Mudi Priyatno, and Triyanna Widiyaningtyas, “A Systematic Literature Review: Recursive Feature Elimination Algorithms,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 9, no. 2, pp. 196-207, 2024.
[CrossRef] [Google Scholar] [Publisher Link]