Big Data Mining For Interesting Pattern Using MapReduced Technique

Aguguo Ihechukwu.C, Matthias Daniel, E.O Bennett

Citation :

Aguguo Ihechukwu.C, Matthias Daniel, E.O Bennett, "Big Data Mining For Interesting Pattern Using MapReduced Technique," International Journal of Computer Science and Engineering , vol. 7, no. 7, pp. 26-33, 2020. Crossref, https://doi.org/10.14445/23488387/IJCSE-V7I7P105

Abstract

In the past few years, huge data are need to be stored, access and retrieved, that has increased drastically all over the world, this fast growth of data
results in the need to analyse the huge amount of data. Due to lack of proper tools and programs, data remains unused and unutilized with important useful knowledge hidden. This study has carryout data mining interesting patterns in big data. Objectoriented design methodology was used. Frequent
pattern growth algorithm on Hadoop using MapReduce has been used and particularly applied it to analyze maximum flight time in flight transaction
data store of 108MB. MapReduce program consists of two functions Mapper and Reducer which runs on all machines in a Hadoop cluster. System was
implemented in matlab. Computation has been performed to analyzed the actual flight time using user constraints, the constraints are arrival delay and
actual elapse time. Airpeace carrier has the longest flight time, the analyzed carrier (Air peace) space was 20000x6 contained 712316 bytes. Thus, the execution time of the entire mining process was 1615 milliseconds.

Keywords

Big data, pattern mining, itemsets,reducer, mapper.

References

[1] Naik, R. R. and Mankar, J.R. (2013). “Mining frequent Item sets from uncertain databases using probabilistic support”. International Journal Emerg. Trends Technology Computer Science, 2(2), pp. 432-6.
[2] Leung, C.K. (2007). “Efficient mining of frequent patterns from uncertain data”. In: Seventh IEEE International Conference on Data Mining, pp. 204.
[3] Ramirez, U. Heinzelman, J., & Waters, C. (2011).”Crowd sourcing crisis information in disaster-affected Haiti (US Institute of Peace)”.In Proceedings of the International Conference on Unmanned Aircraft Systems, 16.
[4] Riondato, M., DeBrabant, J. A., Fonseca, R. and Upfal, E. (2012). “PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce”. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 85-94.
[5] Sangavi, S., Vanmathi, A., Gayathri, R., Raju, R., Paul, P. V. and Dha-vachelvan, P. (2015). “An Enhanced DACHE Model for the MapReduce Environment”. Procedia Computer Science, 50, pp. 579-584.
[6] Tanbeer, S.K. and Leung, C.K. (2013). “PUF-tree: A compact tree structure for frequent pattern mining of uncertain data”. Pac Asia Conference Knowledge Discovery Data Min LNCS, 7818, pp. 13-25.
[7] Toivonen, H. (1996). “Sampling large databases for association rules”.InVLDB, 96, pp. 134-145.
[8] Woo, J. (2012). “Apriori-Map/Reduce Algorithm”. In International Conference on Parallel and Distributed Processing Techniques and Ap-plications (PDPTA), 45.
[9] Wu, G., Li, H., Hu, X., Bi, Y., Zhang, J. and Wu, X. (2009). MReC4.5: C4.5 Ensemble Classification with MapReduce. ChinaGrid Annual Conference, 4, 249-255.
[10] Yahya, O., Hegazy, O. and Ezat, E. (2012). “An Efficient Implementation of Apriori Algorithm Based on Hadoop- MapReduce Model”. International Journal of Reviews in Computing, 12, pp. 59-67.
[11] Yang,X. Y., Liu, Z. and Fu, Y. (2010). “MapReduce as a programming model for association rules algorithm on Hadoop”. In Information Sciences and Interaction Sciences (ICIS), 3rd International Conference, IEEE, pp. 99-102.
[12] Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S. and Stoica, I. (2012). “Resilient distributed datasets: A fault tolerant abstraction for in-memory cluster computing”. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 2-2.
[13] AggarwaL, C. C. and Han, J. (2014).Pattern Design Mining. Springer, 23.
[14] Agrawal, R. and Srikant, R. (1994). “Fast calculations for mining association rules”. In Procedures 20th worldwide conference exceptionally expansive information bases, VLDB, 1215, pp. 487-499.
[15] Agrawal, R., Imieliński, T. and Swami, A. (1993). “Mining affiliation rules between sets of things in huge databases”. SIGMOD Record, 22(2), pp. 207–216.
[16] Amartya, S. and Kundan, K.D. (2007). “Application of Data mining Techniques in Bioinformatics”, B.Tech Computer Science Engineering thesis, National Institute of Technology, (Deemed University), Rourkela.
[17] An, A., Khan, S. and Huang, X. (2003). “Objective and Subjective Algorithms for Grouping Association Rules”. Proceedings Third IEEE International Conference on Data Mining (ICDM), pp.477-480.
[18] Apache, M. (2013).Algorithms - Apache Mahout.Retrieved May 2019, from https://cwiki.apache.org/confluence/display/MAHOUT/Algo
rithms.
[19] Baffour, K. A., Osei-Bonsu, C. and Adekoya, A. F. (2017). “A Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets”.International Journal of Scientific & Technology Research, 6(8).
[20] T.K.Das , Arati Mohapatro."A Study on Big Data Integration with Data Warehouse". International Journal of Computer Trends and Technology (IJCTT) V9(4), 2014
[21] Bounch, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., Renso, C. and Ruggier, S. (2001). “Web log data warehourseing and mining for intelligent web caching”, Journal Data Knowledge Engineering, 36, pp. 165–189.
[22] Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y. Y., Bradski, G., Ng, Y. A. and Olukotun, K. (2006). “Map-Reduce for pattern Learning on Multicore”. Advances in Neural Information Processing Systems, 19, pp. 281-288.
[23] Crikovic, G. D. (2010). “Constructive Research and infocomputational knowledge Generation”, Springer, 314.
[24] Dhanshetti, A. and Rane, T. (2015). “A Survey on Efficient Big Data Clustering using MapReduce”. Data Mining and Knowledge Engineering, 7(2), pp. 47-50.