Temporal Text Summarization of TV Serial Excerpts Using Lingo Clustering and Lucene Summarizer

International Journal of Computer Science and Engineering
© 2015 by SSRG - IJCSE Journal
Volume 2 Issue 5
Year of Publication : 2015
Authors : Sayali Hande, Mrs. M. A. Potey

How to Cite?

Sayali Hande, Mrs. M. A. Potey, "Temporal Text Summarization of TV Serial Excerpts Using Lingo Clustering and Lucene Summarizer," SSRG International Journal of Computer Science and Engineering , vol. 2,  no. 5, pp. 16-22, 2015. Crossref, https://doi.org/10.14445/23488387/IJCSE-V2I5P139


Text summarization is an art of abstracting key contents from one or more information sources. As time is an important dimension of any information space, it is becoming harder to generate meaningful and timely summaries. While summarizing a story in terms of a timeline, a system may have to extract events and order chronologically. Hence the goal of Temporal Summarization is to develop a system that allows users to efficiently monitor the information associated with an event over time. Previous research algorithms having good speed and scalability share one important shortcoming that, none of them explicitly addresses the problem of cluster description quality. For this reason document clustering is done by using Lingo algorithm in which special emphasis is placed on the quality of cluster labels. Lucene Summarizer is used for text summarization. In many cases users have to spend their maximum time in reading the detail story of entire series of Television (TV) serial episodes which they missed. This paper focuses on a novel application used for automatic generation of meaningful temporal text summarization of missing TV serial excerpts. The user can specify the time period for the content.


Information Retrieval System, Lingo clustering algorithm, Singular Value Decomposition, Lucene summarizer, Temporal...


[1] U. Hahn and I. Mani, The challenges of automatic summarization, IEEE Computer, Vol. 33, No. 11, 2000, pp. 29–36. 
[2] Alguliev, Rasim, and Ramiz Aliguliyev. "Evolutionary algorithm for extractive text summarization." Intelligent Information Management 1.02 (2009): 128. 
[3] O. Alonso, J. Strotgen, R. A. Baeza-Yates, and M. Gertz, Temporal information retrieval: Challenges and opportunities, TWAW, vol. 11, pp. 1-8, 2011. 
[4] I. Mani, J. Pustejovsky, and B. Sundheim, Introduction to the special issue on temporal information processing, ACM Transactions on Asian Language Information Processing (TALIP), vol. 3, no. 1, pp. 1-10, 2004. 
[5] R. He, B. Qin, T. Liu, and S. Li, Cascaded regression analysis based temporal multi-document summarization, Informatica: An International Journal of Computing and Informatics, vol. 34, no. 1, 0 pp. 119-124, 2010. 
[6] M. W. Q. L. Maofu Liu1, Wenjie Li, Extractive summarization based on event term clustering, Proceedings of the ACL 2007 Demo and Poster Sessions, p. 185-188, Association for Computational Linguistics, June 2007. 
[7] C. C. Chen and M. C. Chen, Tscan: A content anatomy approach to temporal topic summarization, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 1, pp. 170-183, 2012. 
[8] http://www.itl.nist.gov/iad/mig/tests/tdt/2003/papers/ ldc.ppt, 2003. 
[9] A. Jatowt and M. Ishizuka, Temporal multi-page summarization, Web Intelligence and Agent Systems, vol. 4, no. 2, pp. 163-180, 2006. 
[10] D. G. W. L. Xiaoyan Cai, Renxian Zhang, Simultaneous clustering and noise detection for theme-based summarization, Proceedings of the 5th International Joint Conference on Natural Language Processing, p. 491- 499,November 2011. 
[11] J. Gung and J. Kalita, Summarization of historical articles using temporal event clustering, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 631-635, Association for Computational Linguistics, 2012. 
[12] X. Wan, Timedtextrank: adding the temporal dimension to multi-document summarization, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 867- 868, ACM, 2007. 
[13] M. Georgescu, D. D. Pham, N. Kanhabua, S. Zerr, S. Siersdorfer, and W. Nejdl, Temporal summarization of eventrelated updates in wikipedia, Proceedings of the 22nd international conference on World Wide Web companion, pp. 281-284, International World Wide Web Conferences Steering Committee, 2013. 
[14] S. Osinski, Improving quality of search results clustering with approximate matrix factorisations, Advances in Information Retrieval,pp.167-178, Springer, 2006. 
[15] Osiński, Stanisław, Jerzy Stefanowski, and Dawid Weiss, Lingo: Search results clustering algorithm based on singular value decomposition, Intelligent information processing and web mining. Springer Berlin Heidelberg, 2004. 359-368. 
[16] Y. Zhu and D. Shasha., Efficient elastic burst detection in data streams, Proceedings of KDD ’03, 2003. 
[17] Y. Ouyang, W. Li, Q. Lu, and R. Zhang, A study on position information in document summarization, Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 919-927, Association for Computational Linguistics, 2010. 
[18] Hägerstrand, Anton, Multi Document Summarization, School of Computer Science and Engineering Royal Institute of Technology , 2011. 
[19] Kelly, Liadh, et al. "Report on summarization techniques." Khresmoi project deliverable D 4 (2013): 4. Hanbury, Allan. "Medical information retrieval: an instance of domain-specific search." Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2012. 
[20] DesiTVForum 2015. DesiTVForum. Available at http://desitvforum.net/television/everest/