Intelligent Web Mining Technique using Sequential Pattern

Elliot, S. J, Bennett, E.O, Nwiabu, N. D, Matthias, D.

Citation :

Elliot, S. J, Bennett, E.O, Nwiabu, N. D, Matthias, D., "Intelligent Web Mining Technique using Sequential Pattern," International Journal of Computer Science and Engineering , vol. 11, no. 6, pp. 11-19, 2024. Crossref, https://doi.org/10.14445/23488387/IJCSE-V11I6P103

Abstract

As organizations expand and share more information about their operations online, the website data produced by these organizations becomes an invaluable resource for studying innovations. Effectively managing this vast volume of data and presenting relevant information to users is paramount. It is not practical to analyze and retrieve data manually from large databases. Addressing this challenge requires automated extraction tools enabling users to sift through billions of web pages and unearth pertinent information. This mechanism allows individuals and organizations to analyze data patterns within web contents and page structures, facilitating the discovery of valuable insights and knowledge. It aids in predicting user behavior during their online interactions, uncovering navigation patterns, and extracting useful information from user engagements, thereby enhancing our comprehension of consumer behavior. This paper focuses on extracting patterns of web access. Generally, a weblog can be seen as a series of user identifiers and event pairs. In this paper, web log files are segmented based on mining objectives. Preprocessing techniques are employed on the original web log files to extract segments. Each segment represents a sequence of events from a single user or session, arranged in ascending timestamp order. The model interprets these segments as event sequences and identifies sequential patterns exceeding a certain support threshold. This paper presents the mining of a sequential list of papers from the Neural Information Processing Systems (NIPS) website using the PrefixSpan algorithm. The system is implemented in Matlab programming language. Matlab programming language has been used in web mining to harvest useful data from the web, such as user logs and content. The system is tested and evaluated using accuracy and accessibility.

Keywords

Data mining, Web mining, Sequential patterns, Frequent patterns, Web logs.

References

[1] S.K. Pal, V. Talwar, and P. Mitra, “Web Mining in Soft Computing Framework: Relevance, State of The Art and Future Directions,” IEEE Transactions on Neural Networks, vol. 13, no. 5, pp. 1163-1177, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Runa Bhaumik, Robin Burke, and Bamshad Mobasher, “Effectiveness of Crawling Attacks Against Web-Based Recommender Systems,” Proceedings of the 5th Workshop on Intelligent Techniques for Web Personalization (ITWP-07), pp. 1-10, 2007.
[Google Scholar] [Publisher Link]
[3] R. Agrawal, and R. Srikant, “Mining Sequential Patterns,” Proceedings of the Eleventh International Conference on Data Engineering, pp. 3-14, 1995.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Sravya Vangala, “Mining High Utility Sequential Patterns from Uncertain Web Access Sequences Using The PL-WAP,” Electronic Theses and Dissertations, 2017.
[Google Scholar] [Publisher Link]
[5] Pavel Berkhin, A Survey of Clustering Data Mining Techniques, Grouping Multidimensional Data, Springer, Berlin, Heidelberg, pp. 25-71, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Bite Yang et al., “Biren: Predicting Enhancers with A Deep-Learning-Based Model Using the DNA Sequence Alone,” Bioinformatics, vol. 33, no. 13, pp. 1930-1936, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Kun Ming Kenneth Lim et al., “@Minter: Automated Text Mining of Microbial Interactions,” Bioinformatics, vol. 32, no. 19, pp. 2981-2987, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Laxmi Choudhary, and Shashank Swami, “Exploring the Landscape of Web Data Mining: An In-Depth Research Analysis,” Current Journal of Applied Science and Technology, vol. 42, no. 24, pp. 32-42, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Mohammed J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 42, pp. 31-60, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Jiawei Han et al., “FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining,” KDD '00: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355-359, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Jian Pei et al., “Prefixspan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proceedings 17th International Conference on Data Engineering, Heidelberg, Germany, pp. 215-224, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[12] C. Martins-Antunes, and AL Oliveira, “Sequential Pattern Mining Algorithms: Trade-Offs Between Speed and Memory,” PKDD’04 Workshop on Mining Graphs, Trees and Sequences, 2004.
[Google Scholar]
[13] Zhenglu Yang, and M. Kitsuregawa, “LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern”, 21st International Conference on Data Engineering Workshops (ICDEW'05), Tokyo, Japan, pp. 1222-1222, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Maged El-Sayed, Carolina Ruiz, and Elke A. Rundensteiner, “FS-Miner: Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logs,” WIDM '04: Proceedings of the 6th annual ACM international workshop on Web information and data management, pp. 128-135, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Ashoka Savasere, Edward Omiecinski, and Shamkant B. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases”, VLDB '95: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 432-444, 1995.
[Google Scholar] [Publisher Link]
[16] D. Tanasa, “Web Usage Mining: Contributions to Intersites Logs Preprocessing and Sequential Pattern, Extraction with Low Support,” PhD Thesis, Université De Nice Sophia-Antipolis, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Florent Masseglia, Maguelonne Teisseire, and Pascal Poncelet, Sequential Pattern Mining: A Survey on Issues and Approaches, Encyclopedia of Data Warehousing and Mining, pp. 1-5, 2005.
[CrossRef] [Publisher Link]
[18] Johan Huysmans, Bart Baesens, and Jan Vanthienen, “Web Usage Mining: A Practical Study,” Twelfth Conference on Knowledge Acquisition and Management (KAM2004), Kule, Poland, 2004.
[Google Scholar] [Publisher Link]
[19] Jaideep Srivastava et al., “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” ACM SIGKDD Explorations Newsletter, vol. 1, no. 2, pp. 12-23, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Bettina Berendt et al., The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis, WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles, Springer, Berlin, Heidelberg, pp. 159-179, 2003.
[CrossRef] [Google Scholar] [Publisher Link]