3D Bounding Box Estimation Using Deep Learning and Geometry Based on Yolov7 Output on Single-Board Computer

International Journal of Electrical and Electronics Engineering
© 2024 by SSRG - IJEEE Journal
Volume 11 Issue 5
Year of Publication : 2024
Authors : Tuan Muhammad Naeem Bin Tuan Rashid, Lokman Mohd Fadzil, Mohd Adib Haji Omar
pdf
How to Cite?

Tuan Muhammad Naeem Bin Tuan Rashid, Lokman Mohd Fadzil, Mohd Adib Haji Omar, "3D Bounding Box Estimation Using Deep Learning and Geometry Based on Yolov7 Output on Single-Board Computer," SSRG International Journal of Electrical and Electronics Engineering, vol. 11,  no. 5, pp. 77-84, 2024. Crossref, https://doi.org/10.14445/23488379/IJEEE-V11I5P108

Abstract:

This study investigates the enhancement of 3D bounding box estimation techniques for object localization on SingleBoard Computers (SBCs), focusing on the Jetson Nano platform. The adaptation and optimization of deep learning models, specifically transitioning from VGG networks and YOLOv3 to more efficient alternatives like MobileNetV3 and YOLOv7, within the constraints of SBCs. The implementation leverages the advanced capabilities of MobileNetV3 for 3D bounding box generation, coupled with the superior detection accuracy and speed of YOLOv7 for object detection. The research employs an innovative loss function to improve 3D orientation predictions and utilizes geometric constraints from 2D bounding boxes for precise object localization. A comparative analysis of MobileNet V3, VGG-19, and MobileNet V2 models on the Jetson Nano inference speed and consistency reveals that MobileNet V3, optimized using TensorRT, significantly outperforms others, a preferable candidate for solutions in real-time environments. The study concludes that the strategic optimization of deep learning models on SBCs, like the Jetson Nano, markedly enhances the performance and applicability of 3D bounding box estimation in edge computing environments, offering valuable insights for deploying advanced object detection technologies in resource-constrained scenarios.

Keywords:

3D bounding box, Computer vision, Embedded system, IoT applications, Performance benchmarking.

References:

[1] Arsalan Mousavian et al., “3D Bounding Box Estimation Using Deep Learning And Geometry,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, pp. 5632-5640, 2017.
[CrossRef] [Google Scholar] [Publisher Link]  
[2] Karen Simonyan, and Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv, pp. 1-14, 2014.
[CrossRef] [Google Scholar] [Publisher Link]  
[3] Joseph Redmon, and Ali Farhadi, “Yolov3: An Incremental Improvement,” arXiv, pp. 1-6, 2018.
[CrossRef] [Google Scholar] [Publisher Link]  
[4] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao, “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, pp. 7464-7475, 2023.
[CrossRef] [Google Scholar] [Publisher Link]  
[5] Andrew Howard et al., “Searching For Mobilenetv3,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 1314-1324, 2019.
[CrossRef] [Google Scholar] [Publisher Link]  
[6] NVIDIA, Jetson Nano Developer Kit, NVIDIA Jetson Nano. [Online]. Available: https://www.nvidia.com/en-us/autonomousmachines/embedded-systems/jetson-nano/product-development/
[7] Ross Girshick et al., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 580-587, 2014.
[CrossRef] [Google Scholar] [Publisher Link]  
[8] Ross Girshick, “Fast R-CNN,” 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440-1448, 2015.
[CrossRef] [Google Scholar] [Publisher Link]  
[9] Shaoqing Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
[CrossRef] [Google Scholar] [Publisher Link]  
[10] Joseph Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, pp. 779-788, 2016.
[CrossRef] [Google Scholar] [Publisher Link]  
[11] Joseph Redmon, and Ali Farhadi, “YOLO9000: Better, Faster, Stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, pp. 6517-6525, 2017.
[CrossRef] [Google Scholar] [Publisher Link]  
[12] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao, “Yolov4: Optimal Speed and Accuracy of Object Detection,” arXiv, pp. 1-17, 2020.
[CrossRef] [Google Scholar] [Publisher Link]  
[13] Glenn Jocher et al., “Ultralytics/Yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation,” Zenodo, 2022.
[CrossRef] [Publisher Link]  
[14] Chuyi Li et al., “YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications,” arXiv, pp. 1-17, 2022.
[CrossRef] [Google Scholar] [Publisher Link]  
[15] Dr. Info Sec, VGG-19 Convolutional Neural Network, Machine Learning, 2021. [Online]. Available: https://blog.techcraft.org/vgg-19convolutional-neural-network/  
[16] NVIDIA TensorRT, NVIDIA Developer, [Online]. Available: https://developer.nvidia.com/tensorrt
[17] Sampurna Mandal et al., “Lyft 3D Object Detection for Autonomous Vehicles,” Artificial Intelligence for Future Generation Robotics, pp. 119-136, 2021.
[CrossRef] [Google Scholar] [Publisher Link]  
[18] Tan Zhang et al., “Sim2real Learning of Obstacle Avoidance for Robotic Manipulators in Uncertain Environments,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 65-72, 2022.
[CrossRef] [Google Scholar] [Publisher Link]  
[19] Linh Kästner, Vlad Catalin Frasineanu, and Jens Lambrecht, “A 3D-Deep-Learning-Based Augmented Reality Calibration Method for Robotic Environments Using Depth Sensor Data,” 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, pp. 1135-1141, 2020.
[CrossRef] [Google Scholar] [Publisher Link]  
[20] Yutian Wu et al., “Deep 3D Object Detection Networks Using Lidar Data: A Review,” IEEE Sensors Journal, vol. 21, no. 2, pp. 11521171, 2021.
[CrossRef] [Google Scholar] [Publisher Link]  
[21] Yue Wang et al., “DETR3D: 3D Object Detection from Multi-View Images via 3D-to-2D Queries,” arXiv, pp. 1-12, 2022.
[CrossRef] [Google Scholar] [Publisher Link]  
[22] Pytorch, Models and Pre-Trained Weights. [Online]. Available: https://pytorch.org/vision/stable/models.html  
[23] Laith Alzubaidi et al., “Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions,” Journal of Big Data, vol. 8, pp. 1-74, 2021.
[CrossRef] [Google Scholar] [Publisher Link]