Analyzing Tagging Behavior in Clustering Similar Web Resources through Interactive Visual Demonstration

International Journal of Computer Science and Engineering
© 2014 by SSRG - IJCSE Journal
Volume 1 Issue 10
Year of Publication : 2014
Authors : Marjan Farsi

pdf
How to Cite?

Marjan Farsi, "Analyzing Tagging Behavior in Clustering Similar Web Resources through Interactive Visual Demonstration," SSRG International Journal of Computer Science and Engineering , vol. 1,  no. 10, pp. 1-5, 2014. Crossref, https://doi.org/10.14445/23488387/IJCSE-V1I10P106

Abstract:

There are millions of web pages which are annotated by the means of freely chosen words called tags and saved in social tagging systems, daily. These Tagging-based systems like www.delicious.com provide internet users with the facilities to store their web resources in the web space in order to be accessible and retrievable from everywhere around the world. These sites by organizing data and providing the search facility based on tripartite elements (user, tag, bookmark url), give good information services to the internet users. In recent years, by developing and spreading the usage of social bookmarking sites, many web mining researchers and scientists have become motivated to study the data acquired from these sites to explore new information from these sites. Consequently the techniques for web crawling and data extraction, classifying and clustering algorithms and data visualization methods and tools have been applied for this aim. Usually these acquired and clustered data are analyzed in order to getting the hidden statistical or behavioral facts and concepts embedded in the relation between tripartite elements. In this paper, one aspect of these behavioral facts and concepts, the effect of tagging behavior to find web pages similar in content according to the common tags of the extracted urls, will be analyzed and discussed. All these required data comes from one of these social bookmarking sites, www.delicious.com. This similarity will be explored through executing an implemented Java application in which the similar web pages will be clustered in similar groups by applying similarity measurement algorithms and k-mean clustering technique. This investigation has been done quantitatively and qualitatively. It means that the statistical facts about tagging behavior in finding similar web pages, which are generated by the produced application, will be reported in Excel sheet format, also the processed data will be visually represented in graph structure by applying ‘Prefuse’ visualizing tool. The relationships between visual objects in each graph will be discussed and analyzed from the tagging behavior point of view.

Keywords:

Data visualization tool, k-mean Clustering, similarity measurement algorithms, social bookmarking sites, Tagging Behavior, Web mining,

References:

[1] Valentin Robu, Harry Halpin and Hana Shepherd, Emergence of Consensus and Shared Vocabularies in Collaborative Tagging Systems, ACM Transactions on the Web, Vol. 3, No. 4, September 2009. 
[2] Mitja Koren , Combining ontologies with social tagging systems How current tagging systems can be improved using Semantic Web techniques, Master thesis, Feb 11, 2009. 
[3] Kaikuo Xu; Yu Chen; Yexi Jiang; Rong Tang; Yintian Liu and Jie Gong1, A Comparative Study of Correlation Measurements for Searching Similar Tags, Advanced Data Mining and Applications Lecture notes in Computer Science, (New York: Springer, Volume 5139, 2008) pp 709-716
[4] Glen Jeh and Jennifer Widom, SimRank: a measure of structural-context similarity, International Conference on Knowledge Discovery and Data Mining, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, 2002, 538-543. 
[5] Kardi Teknomo, K-Means Clustering Tutorial (Stullu , Jun 03, 2014) 
[6] http://prefuse.org/ , under the terms of the GNU Free Documentation License, visited on 12/10/2014. 
[7] Ian Davis and Talis Systems Limited, Supporting Change Propagation in RDF, Paper submitted, 2005. http://www.w3.org/2009/12/rdf-ws/papers/ws0
[8] Brian McBride, Updated by: Daniel Boothby and Chris Dollin, An Introduction to RDF and the Jena RDF API,http://jena.sourceforge.net/ tutorial/RDF_API/, Id: index.html,v 1.24 2009/02/09 11:53:07 andy_seaborne Exp. 
[9] Konstantinos Kanaris, Jena Framework, www.icsd.aegean.gr/kotis/OE&SW'07/myPresenations/Jena.ppt 
[10] Marco Luca Sbodio and Edwin Simpson, Tag Clustering with Self Organizing Maps, HP Laboratories, October 5, 2009, 338. 
[11] Hakan Duman, Alex Healing and Robert Ghanea-Hercock, Adaptive Visual Clustering for Mixed-Initiative Information Structuring() portal.acm.org/citation.cfm?id=1601544.1601591