Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams

Abstract

Documents created and distributed on the Internet are ever changing in various forms. Most of existing works are devoted to topic modeling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In order to characterize and detect personalized and abnormal behaviors of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining Useraware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. Here present solutions to solve this innovative mining problem through three phases: pre-processing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by patterngrowth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that our approach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantly reflect users' characteristics. Swati V. Mengje | Prof. Rajeshri R. Shelke"Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd101.pdf http://www.ijtsrd.com/engineering/computer-engineering/101/efficient-way-to-identify-user-aware-rare-sequential-patterns-in-document-streams/swati-v-mengje

Authors and Affiliations

Keywords

Related Articles

An Adaptive Technique in Electronic Health Record for Clinical Decision Making Based on Data Visualization

Cloud computing is a collection of several computer resources that consists of both software and hardware. It is a type of service that is delivered over the internet and can be accessible from anywhere. 1 The data and s...

Identification of Medicinal Plants using Geometric Features of Leaf Image Through SVM

Plants are important to all the living beings. Further in the Indian Science, it is considered as greatest asset and gives rise to a new science called Ayurveda. Some plants provide food some have medicinal value. People...

Memory Management in BigData: A Perpective View

The requirement to perform complicated statistic analysis of big data by institutions of engineering, scientific research, health care, commerce, banking and computer research is immense. However, the limitations of the...

Semantic Peculiarities of Antonyms Based on the Works by I. Yusupov

The article depicts stylistic features of antonyms in English and Karakalpak languages, through analyzing comparatively, and to note stylistic peculiarities, lexical and semantic features of antonyms in English and Karak...

Review on Traditional Indian Herbs Punarnava and Its Health Benefits

The scientific name of Punarnava is Boerhavia Diffusa Linn. According to Ayurveda, punarnava is a species of flowering plant. Maximum 25 of today's prescription drug come from plant extract. Mostly medicinal plants come...

Download PDF file
  • EP ID EP357279
  • DOI -
  • Views 212
  • Downloads 0

How To Cite

(2017). Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams. International Journal of Trend in Scientific Research and Development, 1(4), -. https://europub.co.uk/articles/-A-357279