Outlier Detection Using Distributed Mining Technology In Large Database

Mr.B.Muruganantham, Ms. Ankita Dubey

Citation :

Mr.B.Muruganantham, Ms. Ankita Dubey, "Outlier Detection Using Distributed Mining Technology In Large Database," International Journal of Computer Science and Engineering , vol. 2, no. 2, pp. 6-11, 2015. Crossref, https://doi.org/10.14445/23488387/IJCSE-V2I2P102

Abstract

In many data analysis tasks, a large number of variables are being recorded or sampled. One of the first steps towards obtaining a coherent analysis is the detection of outlaying observations. A distributed approach is presented to detect distance-based outliers, based on the concept of outlier detection solving set. Data objects, which are different from or inconsistent with the remaining set of data, are called outliers. Many data mining techniques exists that attempt to find patterns that occur frequently in the data in which outliers are treated as noise that needs to be removed from a dataset. It is worth to notice that this is a unique peculiarity of distributed data mining technique, since other distributed methods for outlier detection are not able to return a model of the data that can be used for predicting novel outliers. In this paper, we analyze and detect the distance-based outliers in the data set. Priority is given to reduce the processing time of the supervisor node. This is done by simultaneous data computation by local nodes. It also deals with the drawbacks of the centralized system such as complete failure of the system and security.

Keywords

Clustering, Distance-based outlier,Data mining, Outlier detection.

References

[1.] Fabrizio Angiulli, Senior Member, IEEE, Stefano Basta, Stefano Lodi, and Claudio Sartori, ―Distributed Strategies for Mining Outliers in Large Data Sets‖ , IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 7, july 2013.
[2.] A. Ghoting, S. Parthasarathy, and M.E. Otey, ―Fast Mining of Distance-Based Outliers in High-Dimensional Datasets,‖ Data Mining Knowledge Discovery, vol. 16, no. 3, pp. 349-364, 2008.
[3.] A. Koufakou and M. Georgiopoulos, ―A Fast Outlier Detection Strategy for Distributed High-Dimensional Data Sets with Mixed Attributes,‖ Data Mining Knowledge Discovery, vol. 20, pp. 259-289, 2009.
[4.] F. Angiulli and C. Pizzuti, ―Outlier Mining in Large High- Dimensional Data Sets,‖ IEEE Trans. Knowledge and Data Eng., vol. 2, no. 17, pp. 203-215, Feb. 2005.
[5.] Srinivasa Rao, Divakar Ch , Govardhan A , ― An Optimized Approach for Discovering Anomalies in Distributed Data Mining ‖, IJARCSSE , vol. 3,no. 3,march 2013.
[6.] Edward Hung, David W. Cheung, ―Parallel Mining of Outliers in Large Database‖, Distributed and Parallel Databases, 12, 5–26, 2002.