Financial Fraud Detection: Multi-Objective Genetic Programming with Grammars and Statistical Selection Learning

International Journal of Computer Science and Engineering
© 2020 by SSRG - IJCSE Journal
Volume 7 Issue 2
Year of Publication : 2020
Authors : Haibing Li, Wing-Lun Lam, Chi-Wai Chung, Man-Leung Wong

How to Cite?

Haibing Li, Wing-Lun Lam, Chi-Wai Chung, Man-Leung Wong, "Financial Fraud Detection: Multi-Objective Genetic Programming with Grammars and Statistical Selection Learning," SSRG International Journal of Computer Science and Engineering , vol. 7,  no. 2, pp. 1-18, 2020. Crossref,


Financial fraud is a serious problem that often produces destructive results in the world and it is exacerbating swiftly in many countries. It refers to many activities including credit card fraud, money laundering, insurance fraud, corporate fraud, etc. The major consequences of financial fraud are loss of billions of dollars each year, investor confidence and corporate reputation. Therefore, a research area called Financial Fraud Detection (FFD) is obligatory, in order to prevent the destructive results caused by financial fraud. In this study, we propose a new approach based on multi-objectives optimization, Genetic Programming (GP), grammars, and ensemble learning for solving FFD problems. We comprehensively compare the proposed approach with Logistic Regression, Neural Networks, Support Vector Machine, Bayesian Networks, Decision Trees, AdaBoost, Bagging and LogitBoost on four FFD datasets including two real-life datasets. The experimental results showed the effectiveness of the new approach. It outperforms existing data mining methods in different aspects. There are two major contributions of the study. First, it evaluates a number of existing data mining techniques on the given FFD problems. Second, it suggests a new approach for handling these far-reaching problems. Moreover, a novel ensemble learning method called Statistical Selection Learning is proposed.


Financial Fraud Detection, Multi-objective Optimization, Grammar-Based Genetic Programming, Ensemble Learning.


[1] E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun, “The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature,” Decision Support Systems, vol. 50, no. 3, pp. 559–569, 2011.
[2] M. Syeda, Y.-Q. Zhang, and Y. Pan, “Parallel granular neural networks for fast credit card fraud detection,” in Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, vol. 1. IEEE, 2002, pp. 572–577.
[3] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data mining for credit card fraud: A comparative study,” Decision Support Systems, vol. 50, no. 3, pp. 602–613, 2011.
[4] P. Ravisankar, V. Ravi, G. Raghava Rao, and I. Bose, “Detection of financial statement fraud and feature selection using data mining techniques,” Decision Support Systems, vol. 50, no. 2, pp. 491–500, 2011.
[5] D. Cumming, W. Hou, and E. Lee, “The role of financial analysts in deterring corporate fraud in China,” SSRN Electronic Journal, 2011.
[6] Y. Kou, C.-T. Lu, S. Sirwongwattana, and Y.-P. Huang, “Survey of fraud detection techniques,” in Proceedings of 2004 IEEE international conference on networking, sensing and control, vol. 2. IEEE, 2004, pp. 749–754.
[7] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.
[8] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers Inc., 2011.
[9] I. Bose and R. K. Mahapatra, “Business data mining: a machine learning perspective,” Information & management, vol. 39, no. 3, pp. 211–225, 2001.
[10] E. Turban, R. Sharda, D. Delen, J. Aronson, T. Liang, and D. King, Decision Support and Business Intelligence Systems, 9th ed. Pearson Prentice Hall, 2010.
[11] C. Phua, V. Lee, K. Smith, and R. Gayler, “A comprehensive survey of data mining-based fraud detection research,” arXiv preprint arXiv:1009.6119, 2010.
[12] G. Chen, M. Firth, D. N. Gao, and O. M. Rui, “Ownership structure, corporate governance, and fraud: Evidence from china,” Journal of Corporate Finance, vol. 12, no. 3, pp. 424–448, 2006.
[13] A. Agrawal, S. Chadha, M. Billett, R. Boylan, M. Chen, J. Engl, J. Jaffe, S. Krishnaswami, S. Lee, F. L. de silanes, N. R. Prabhala, Y. Qian, D. Reeb, R. Romano, P. K. Sen, and M. Stone, “Corporate governance and accounting scandals,” Journal of law and economics, vol. 48, no. 2, pp. 371–406, 2005.
[14] B. E. Hermalin and M. S. Weisbach, “Information disclosure and corporate governance,” The Journal of Finance, vol. 67, no. 1, pp. 195–233, 2012.
[15] T. Y. Wang, A. Winton, and X. Yu, “Corporate fraud and business conditions: Evidence from IPOs,” The Journal of Finance, vol. 65, no. 6, pp. 2255–2292, 2010.
[16] G. Chen, M. Firth, D. N. Gao, and O. M. Rui, “Is China’s securities regulatory agency a toothless tiger? evidence from enforcement actions,” Journal of Accounting and Public Policy, vol. 24, no. 6, pp. 451–488, 2005.
[17] E. Kirkos, C. Spathis, and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial statements,” Expert Systems with Applications, vol. 32, no. 4, pp. 995–1003, 2007.
[18] S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas, “Forecasting fraudulent financial statements using data mining,” International Journal of Computational Intelligence, vol. 3, no. 2, pp. 104–110, 2006.
[19] J. W. Lin, M. I. Hwang, and J. D. Becker, “A fuzzy neural network for assessing the risk of fraudulent financial reporting,” Managerial Auditing Journal, vol. 18, no. 8, pp. 657–665, 2003.
[20] I. Yeh, C.-h. Lien et al., “The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients,” Expert Systems with Applications, vol. 36, no. 2, pp. 2473–2480, 2009.
[21] S. Haykin, Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.
[22] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp. 123–140, 1996.
[23] R. E. Schapire, “The strength of weak learnability,” Machine learning, vol. 5, no. 2, pp. 197–227, 1990.
[24] G. Cui, M. L. Wong, and H.-K. Lui, “Machine learning for direct marketing response models: Bayesian networks with evolutionary programming,” Management Science, vol. 52, no. 4, pp. 597–612, 2006.
[25] C. C. Coello, G. B. Lamont, and D. A. Van Veldhuizen, Evolutionary algorithms for solving multi-objective problems. Springer, 2007.
[26] A. Ponsich, A. L. Jaimes, and C. A. C. Coello, “A survey on multiobjective evolutionary algorithms for the solution of the portfolio optimization problem and other finance and economics applications,” IEEE Transactions on Evolutionary Computation, vol. 17, no. 3, pp. 321–344, 2013.
[27] H. Zhao, “A multi-objective genetic programming approach to developing pareto optimal decision trees,” Decision Support Systems, vol. 43, no. 3, pp. 809–826, 2007.
[28] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
[29] M. L. Wong and K. S. Leung, Data mining using grammar based genetic programming and applications. Kluwer Academic Publisher, 2000.
[30] M. L. Wong and K. S. Leung, “Evolutionary program induction directed by logic grammars,” Evolutionary Computation, vol. 5, no. 2, pp. 143–180, 1997.
[31] M. L. Wong, “A flexible knowledge discovery system using genetic programming and logic grammars,” Decision Support Systems, vol. 31, no. 4, pp. 405–428, 2001.
[32] A. Konak, D. W. Coit, and A. E. Smith, “Multi-objective optimization using genetic algorithms: A tutorial,” Reliability Engineering & System Safety, vol. 91, no. 9, pp. 992–1007, 2006.
[33] H. Eskandari and C. D. Geiger, “A fast pareto genetic algorithm approach for solving expensive multiobjective optimization problems,” Journal of Heuristics, vol. 14, no. 3, pp. 203–241, 2008.
[34] J.-H. Wang, Y.-L. Liao, T.-m. Tsai, and G. Hung, “Technology based financial frauds in Taiwan: Issues and approaches,” in Proceedings of the 2006 IEEE International Conference on Systems, Man and Cybernetics, 2006, pp. 1120–1124.
[35] R. J. Bolton and D. J. Hand, “Statistical fraud detection: A review,” Statistical Science, vol. 17, no. 3, pp. 235–255, 2002.
[36] A. Srivastava, A. Kundu, S. Sural, and A. K. Majumdar, “Credit card fraud detection using hidden markov model,” IEEE Transactions on Dependable and Secure Computing, vol. 5, no. 1, pp. 37–48, 2008.
[37] G. Geis and P. Jesilow, White-collar crime. Sage Periodicals Press, 1993.
[38] FBI, “Financial crimes report to the public 2007,” Department of Justice, United States, Tech. Rep., 2007.
[39] S. L. Gillan, “Recent developments in corporate governance: An overview,” Journal of corporate finance, vol. 12, no. 3, pp. 381–402, 2006.
[40] F. Yu and X. Yu, “Corporate lobbying and fraud detection,” Journal of Financial and Quantitative Analysis, vol. 46, no. 06, pp. 1865–1891, 2012.
[41] A. Dyck, A. Morse, and L. Zingales, “Who blows the whistle on corporate fraud?” The Journal of Finance, vol. 65, no. 6, pp. 2213–2253, 2010.
[42] D. B. Farber, “Restoring trust after fraud: Does corporate governance matter?” The Accounting Review, vol. 80, no. 2, pp. 539–561, 2005.
[43] J. D. Cox, R. S. Thomas, and D. Kiku, “SEC enforcement heuristics: An empirical inquiry,” Duke Law Journal, vol. 53, pp. 737–779, 2003.
[44] R. Brause, T. Langsdorf, and M. Hepp, “Neural data mining for credit card fraud detection,” in Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence. IEEE, 1999, pp. 103–106.
[45] A. Shen, R. Tong, and Y. Deng, “Application of classification models on credit card fraud detection,” in Proceedings of the 2007 International Conference on Service Systems and Service Management. IEEE, 2007, pp. 1–4.
[46] P. K. Chan and S. J. Stolfo, “Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection,” in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1998, pp. 164–168.
[47] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Systems with Applications, vol. 41, no. 10, pp. 4915–4928, 2014.
[48] D. Sánchez, M. Vila, L. Cerda, and J.-M. Serrano, “Association rules applied to credit card fraud detection,” Expert Systems with Applications, vol. 36, no. 2, pp. 3630–3640, 2009.
[49] J. Yuan, C. Yuan, and X. Deng, “The effects of manager compensation and market competition on financial fraud in public companies: An empirical study in China.” International Journal of Management, vol. 25, no. 2, 2008.
[50] C. T. Spathis, “Detecting false financial statements using published data: some evidence from Greece,” Managerial Auditing Journal, vol. 17, no. 4, pp. 179–191, 2002.
[51] W. Zhou and G. Kapoor, “Detecting evolutionary financial statement fraud,” Decision Support Systems, vol. 50, no. 3, pp. 570–575, 2011.
[52] B. Bai, J. Yen, and X. Yang, “False financial statements: characteristics of china’s listed companies and cart detecting approach,” International journal of information technology & decision making, vol. 7, no. 2, pp. 339–359, 2008.
[53] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, MI, USA: The University of Michigan Press, 1975.
[54] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA, USA: Addison-Wesley, 1989.
[55] J. R. Koza, Genetic programming: on the programming of computers by means of natural selection. MIT press, 1992.
[56] R. Poli, W. Langdon, and N. F. McPhee, A Field Guide to Genetic Programming. LuLu Enterprises, 2008.
[57] M. A. Keane, M. J. Streeter, W. Mydlowec, G. Lanza, and J. Yu, Genetic programming IV: Routine human-competitive machine intelligence. Springer, 2006, vol. 5.
[58] T. G. Dietterich, “Ensemble methods in machine learning,” in Proceedings of the First International Workshop on Multiple classifier systems. Springer, 2000, pp. 1–15.
[59] Y. Chen, M.-L. Wong, and H. Li, “Applying ant colony optimization to configuring stacking ensembles for data mining,” Expert Systems with Applications, vol. 41, no. 6, pp. 2688–2702, 2014.
[60] D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2, pp. 241–259, 1992.
[61] J. E. Hopcroft, Introduction to Automata Theory, Languages, and Computation, 3/E. Pearson Education India, 2008.
[62] R. Agrawal, T. Imieli´nski, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD Record, vol. 22, no. 2. ACM, 1993, pp. 207–216
[63] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10–18, 2009.
[64] A. Asuncion and D. Newman, UCI machine learning repository, 2007.
[65] A. Liu, J. Ghosh, and C. E. Martin, “Generative oversampling for mining imbalanced datasets,” in Proceedings of the 2007 International Conference on Data Mining, 2007, pp. 66–72.
[66] N. V. Chawla, K.W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligent Research, vol. 16, no. 1, pp. 321–357, 2012.
[67] G. Cui, M. L. Wong, and X. Wan, “Cost-sensitive learning via priority sampling to improve the return on marketing and CRM investment,” Journal of Management Information Systems, vol. 29, no. 1, pp. 341–373, 2012.