CNN Architectures for Image Classification: A Comparative Study Using ResNet50V2, ResNet152V2, InceptionV3, Xception, and MobileNetV2

International Journal of Electronics and Communication Engineering
© 2024 by SSRG - IJECE Journal
Volume 11 Issue 9
Year of Publication : 2024
Authors : Nitin Duklan, Sachin Kumar, Himani Maheshwari, Rajesh Singh, Sameer Dev Sharma, Siddharth Swami
pdf
How to Cite?

Nitin Duklan, Sachin Kumar, Himani Maheshwari, Rajesh Singh, Sameer Dev Sharma, Siddharth Swami, "CNN Architectures for Image Classification: A Comparative Study Using ResNet50V2, ResNet152V2, InceptionV3, Xception, and MobileNetV2," SSRG International Journal of Electronics and Communication Engineering, vol. 11,  no. 9, pp. 11-21, 2024. Crossref, https://doi.org/10.14445/23488549/IJECE-V11I9P102

Abstract:

Image processing techniques have been used for picture categorization in several domains over the last year, including education, research, railways, and other sectors. The CNN (Convolutional Neural Network) is often regarded as the most potent method for picture categorization. This study included five renowned image-processing algorithms using the CNN architecture: RestNet50V2, RestNet152V2, Xception, Inceptionv3, and MobileNetV2. We assessed the classification of the Uttaranchal University, Dehradun dataset, which has 20 distinct department photos for classification. After a certain iteration, our primary goal is to achieve the best model accuracy possible using the available hardware. To evaluate performance, we used other measures such as accuracy, recall, and F1-score. The investigation demonstrated the exceptional precision of all five algorithms: RestNet50V2 (98.88), RestNet152V2 (99.10), Xception (99.17), Inceptionv3 (99.2), and MobileNetV2 (93.71). The Xception method was chosen for data training, testing, and validation because of its superior accuracy. Hardware resources, memory capacity, and data diversity are also considered while assessing algorithm pros and cons. This research sheds light on the CNN model's performance and helps companies and universities choose better photo classification algorithms. This research has also advanced machine learning and deep learning algorithms, as well as their practical application in real-world situations.

Keywords:

CNN, Inceptionv3, MobileNetV2, RestNet50V2, RestNet152V2, Xception.

References:

[1] Kaiming He et al., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 770-778, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[2] François Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 1800-1807, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Mark Sandler et al., “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 4510-4520, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Christian Szegedy et al., “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 4278- 4284, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Hamid Rezatofighi et al., “Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 658-666, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Florian Debrauwer, EfficientNet | Rethinking Model Scaling for Convolutional Neural Networks, Medium, 2022. [Online]. Available: https://medium.com/to-cut-a-long-paper-short/efficientnet-rethinking-model-scaling-for-convolutional-neural-networks-eec0b2238b36
[7] Andrew G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Arxiv, pp. 1-9, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Wei Liu et al., “SSD: Single Shot MultiBox Detector,” Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, Netherlands, pp. 21-37, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Karen Simonyan, and Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Arxiv, pp. 1-14, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Ahmad Rofiqul Muslikh, De Rosal Ignatius Moses Setiadi, and Arnold Adimabua Ojugo, “Rice Disease Recognition Using Transfer Learning Xception Convolutional Neural Network,” Journal of Information Engineering, vol. 6, no. 4, pp. 1535-1540, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[11] R. Vaitheeshwari, V. Sathiesh Kumar, and S. Anubha Pearline, “Design and Implementation of Human Safeguard Measure Using Separable Convolutional Neural Network Approach,” Computer Vision and Image Processing, Communications in Computer and Information Science, vol. 1148, pp. 319-330, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Tahmida Mahmud et al., “Prediction and Description of Near-Future Activities in Video,” Computer Vision and Image Understanding, vol. 210, pp. 1-12, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Huiren Tian et al., “An LSTM Neural Network for Improving Wheat Yield Estimates by Integrating Remote Sensing Data and Meteorological Data in the Guanzhong Plain, PR China,” Agricultural and Forest Meteorology, vol. 310, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Marco Klaiber, and Jonas Klopfer, “A Systematic Literature Review on SOTA Machine Learning-Supported Computer Vision Approaches to Image Enhancement,” Journal of Computer Science and Information, vol. 15, no. 1, pp. 21-31, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Ibrahim Goni, Asabe Sandra Ahmadu, and Yusuf Musa Malgwi, “Image Processing Techniques and Neuro-Computing Algorithms in Computer Vision,” Advances in Networks, vol. 9, no. 2, pp. 33-38, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Shuai Teng et al., “Structural Damage Detection Based on Convolutional Neural Networks and Population of Bridges,” Measurement, vol. 202, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Aya M. Shaaban, Nancy M. Salem, and Walid I. Al-Atabany, “A Semantic-Based Scene Segmentation Using Convolutional Neural Networks,” AEU - International Journal of Electronics and Communications, vol. 125, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Jianxun Lian et al., “Towards Better Representation Learning for Personalized News Recommendation: A Multi-Channel Deep Fusion Approach,” Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 3805-3811, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Xingxun Jiang et al., “DFEW: A Large-Scale Database for Recognizing Dynamic Facial Expressions in the Wild,” Proceedings of the 28th ACM International Conference on Multimedia, New York, United States, pp. 2881-2889, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Gao Huang et al., “Densely Connected Convolutional Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 2261-2269, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Mingxing Tan, and Quoc V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Arxiv, pp. 1-11, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Min Lin, Qiang Chen, and Shuicheng Yan, “Network in Network,” Arxiv, pp. 1-10, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Matthew D. Zeiler, and Rob Fergus, “Visualizing and Understanding Convolutional Networks,” Computer Vision – European Conference on Computer Vision 2014, Lecture Notes in Computer Science, vol. 8689, pp. 818-833, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Hugo Touvron et al., “Fixing the Train-Test Resolution Discrepancy,” Arxiv, pp. 1-14, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Dan Claudiu Cireşan et al., “Deep, Big, Simple Neural Nets for Handwritten Digit Recognition,” Neural Computation, vol. 22, no. 12, pp. 3207-3220, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, “Distilling the Knowledge in a Neural Network,” Arxiv, pp. 1-9, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention International Conference on Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Computer Science, vol. 9351, pp. 234-241 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[28] N.L.W. Keijsers, “Neural Networks,” Encyclopedia of Movement Disorders, pp. 257-259, 2010.
[CrossRef] [Publisher Link]
[29] Yunchao Gong et al., “Multi-scale Orderless Pooling of Deep Convolutional Activation Features,” Computer Vision – European Conference on Computer Vision 2014, Lecture Notes in Computer Science, vol. 8695, pp. 392-407, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[30] J.R.R. Uijlings et al., “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, pp. 154-171, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Andrej Karpathy et al., “Large-Scale Video Classification with Convolutional Neural Networks,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 1725-1732, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps,” Arxiv, pp. 1-8, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Arnab Ghosh et al., “Multi-Agent Diverse Generative Adversarial Networks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8513-8521, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Ross Girshick, “Fast R-CNN,” 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1440-1448,  2015.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Hyeonseob Nam, and Bohyung Han, “Learning Multi-Domain Convolutional Neural Networks for Visual Tracking,” 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 4293-4302, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Yonis Gulzar, “Fruit Image Classification Model Based on MobileNetV2 with Deep Transfer Learning Technique,” Sustainability, vol. 15, no. 3, pp. 1-14, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Ke Dong et al., “MobileNetV2 Model for Image Classification,” 2020 2nd International Conference on Information Technology and Computer Application, Guangzhou, China, pp. 476-480, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Slamet Riyadi, Febriyanti Azahra Abidin, and Nia Audita, “Comparison of ResNet50V2 and MobileNetV2 Models in Building Architectural Style Classification,” 2024 International Conference on Intelligent Systems and Computer Vision, Fez, Morocco, pp. 1-8, 2024.
[CrossRef] [Google Scholar] [Publisher Link]