Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream

Abstract

As we know internet is the source of large number textual document those are created by users and distributed in various forms. Most of existing works are done on topic modelling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In this paper, in order to characterize and detect personalized and abnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviours. We present a group of algorithms to solve this innovative mining problem through three phases: preprocessing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Twitter is the best real time example, from that we able to discover the users abnormal behaviour. This approach gives the effective and efficient way to find out rare pattern in document string.

Authors and Affiliations

Swati V. Mengje, Prof. R R Shelke

Keywords

Related Articles

Efficient Dom protocol in mobile AD-HOC network

In this paper, we build up a versatile destination-arranged multicast (DOM) convention for PC systems where the switches have upgraded insight to process parcels. The essential thought of DOM is that every multicast inf...

Compare results of restoration of Motion Blurred Images using Non Blind Techniques

Image deblurring and restoration has been of great importance nowadays. Image recognition becomes difficult when it comes to blurred and poorly illuminated images and it is here image restoration come to picture. In thi...

Design and Performance Analysis of Solar Powered Vehicle- Sun Bicycle

Energy is the primary and most universal measure of all kinds of work by human beings and nature. There has been an enormous increase in the global demand for energy in recent years, as a result of industrial developmen...

Renewable Energy Based On Compressed Air Energy Storage As a Technology Embedded In Gwalior MP

Compressed air energy storage (CAES) is one technology that is proposed to increase flexibility when integrating renewable energy sources such as wind, solar and tidal generation with the power grid. By creating a stora...

Behaviour of Marine Water on Bridge Structure

Now –a-days due to increasing impurities in marine water degradation of bridge structures takes place at early stage which results in reducition of strength and durability of concrete structures .So to increase the stre...

Download PDF file
  • EP ID EP23108
  • DOI 10.22214/ijraset.2017.2019
  • Views 235
  • Downloads 7

How To Cite

Swati V. Mengje, Prof. R R Shelke (2017). Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 5(2), -. https://europub.co.uk/articles/-A-23108