Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream

Abstract

As we know internet is the source of large number textual document those are created by users and distributed in various forms. Most of existing works are done on topic modelling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In this paper, in order to characterize and detect personalized and abnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviours. We present a group of algorithms to solve this innovative mining problem through three phases: preprocessing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Twitter is the best real time example, from that we able to discover the users abnormal behaviour. This approach gives the effective and efficient way to find out rare pattern in document string.

Authors and Affiliations

Swati V. Mengje, Prof. R R Shelke

Keywords

Related Articles

Performance Analysis of a Self-Activating Solar Tracking Setup

Solar energy is rapidly gaining acclaim and acceptance as an important, efficient and dependable means to substitute conventional energy resources. To make solar energy more viable, the efficiency of solar array systems...

PTP Approach in Network Security for Misbehaviour Detection

A PTP approach in network security for misbehaviour detection system present a method for detecting malicious misbehaviour activity within networks. Along with the detection, it also blocks the malicious system within t...

Identification, Delineation and Mapping of Micro Watershed of Kaneri

Water is most important part of our life. Now a days almost in all parts of Maharashtra people are suffering from water scarcity. To overcome this problem water conservation and management is the only solution. Delineat...

Advance Power Supply Management Through Power Supply Optimization to Significantly Improve the PUE at Multiple Datacenters

A complete methodology of running a data center by using various optimization algorithms to reduce energy consumption in data centers by considering the placement of virtual machines onto the servers in the data center...

Design and Development of Water Distillation & Desalination Plant

As per our observations many villages and communities are facing severe water shortage in the country. There is a scarcity of clean water for drinking as well as agricultural purpose. Our project is aimed at finding a c...

Download PDF file
  • EP ID EP23108
  • DOI 10.22214/ijraset.2017.2019
  • Views 290
  • Downloads 7

How To Cite

Swati V. Mengje, Prof. R R Shelke (2017). Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 5(2), -. https://europub.co.uk/articles/-A-23108