Intelligent Data Extraction from Image Documents

International Journal of Computer Science and Engineering
© 2024 by SSRG - IJCSE Journal
Volume 11 Issue 11
Year of Publication : 2024
Authors : Dhivya Nagasubramanian

pdf
How to Cite?

Dhivya Nagasubramanian, "Intelligent Data Extraction from Image Documents," SSRG International Journal of Computer Science and Engineering , vol. 11,  no. 11, pp. 25-34, 2024. Crossref, https://doi.org/10.14445/23488387/IJCSE-V11I11P104

Abstract:

Enterprises often possess a vast collection of scanned documents and images with valuable data crucial for organizational growth and success. In the finance industry, for instance, banks manage extensive collateral documents, tax forms, title deeds, and other critical materials, such as check images, syndication records, and flood documentation. Extracting information from these extensive, scanned files typically involves manual data entry, which is time-consuming and susceptible to human error. With advancements in AI, document entity extraction can now be automated in multiple ways. Heuristic methods can be employed for simpler documents where entities consistently appear in predefined spaces. More complex scenarios can leverage AI frameworks, such as Convolutional Neural Networks (CNNs), trained on labeled images to detect regions of interest, producing bounding boxes and confidence scores for the predictions. Generative AI toolkits offer another solution: extracting entities directly from documents or facilitating question-and-answer interactions to retrieve specific information efficiently. This research paper explores how these methodologies can be swiftly adopted based on document complexity, evaluates the advantages and limitations of each approach, and discusses the role of pipeline building in enhancing the accuracy of AI model predictions.

Keywords:

Document intelligence, Document extraction, CNN, Convolutional neural network, Transformer, OCR, Optical character recognition, Encoder-decoder model, Object detection, Layout detection.

References:

[1] Ashish Vaswani et al., “Attention Is All You Need,” NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, pp. 6000-6010, 2017.
[Google Scholar] [Publisher Link]
[2] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao, “YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, pp. 7464-7475, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Wei Liu et al., “SSD: Single Shot MultiBox Detector,” 14th European Conference on Computer Vision – ECCV 2016, Amsterdam, The Netherlands, pp. 21-37, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Guillaume Lample et al., “Neural Architectures for Named Entity Recognition,” Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 260 270, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Kaiming He et al., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770-778, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Pranav Rajpurkar et al., “SQuAD: 100,000+ Questions for Machine Comprehension of Text,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, Texas, pp. 2383-2392, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Shaoqing Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligences, vol. 39, no. 6, pp. 1197-1149, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Joseph Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 779-788, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[9] R. Smith, “An Overview of the Tesseract OCR Engine,” Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, pp. 629-633, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Alan Akbik et al., “FLAIR: An Easy-to-Use Framework for State-of-the-Art Natural Language Processing,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota, pp. 54-59, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[11] John Lafferty, Andrew McCallum, and Fernando Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282-289, 2001.
[Google Scholar] [Publisher Link]
[12] Mohamed Yassine Landolsi, Lobna Hlaoua, and Lotfi Ben Romdhane, “Information Extraction from Electronic Medical Documents: State of the Art and Future Research Directions,” Knowledge and Information Systems, vol. 65, pp. 463-516, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] David Doermann, “The Indexing and Retrieval of Document Images: A Survey,” Computer Vision and Image Understanding, vol. 70, no. 3, pp. 287-298, 1998.
[CrossRef] [Google Scholar] [Publisher Link]