Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream

Abstract

As we know internet is the source of large number textual document those are created by users and distributed in various forms. Most of existing works are done on topic modelling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In this paper, in order to characterize and detect personalized and abnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviours. We present a group of algorithms to solve this innovative mining problem through three phases: preprocessing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Twitter is the best real time example, from that we able to discover the users abnormal behaviour. This approach gives the effective and efficient way to find out rare pattern in document string.

Authors and Affiliations

Swati V. Mengje, Prof. R R Shelke

Keywords

Related Articles

Agriculture Field Motor Control System Based on IOT

Now a day’s many real time applications are connected to internet. This is very useful and helpful for customers to see application from anywhere in the world. These web connected applications are belongs to server clie...

Effect of Moisture and Confining Pressure on Mechanical Behavior of Shiwalik Sandstone

The study for the effect of moisture on Shiwalik Sandstone has been carried out considering the different hydroelectric projects running in the foothills of the Himalayas. The strength of the rock depends upon its miner...

A review paper on: friction Rivet welding

his paper provides survey on friction rivet welding process which is simple and better joining method of polymeric multi material structure(macro composites). This family of material is characterised by presence of soli...

An Automated Non-Invasive Blood Glucose Estimator and Infiltrator

This paper presents design and working of a microcontroller based non-invasive blood glucose estimator and an automated insulin injector for worst-case automatically. The emergence of this concept is to overcome the inv...

Unpropitious Effect of Bisphenol-A and Its Impact on Human Health

Bisphenol A (BPA) is a chemical used in the lining of some food and beverage packaging to protect food from contamination and extend shelf life. It’s also used in non-food products. It has been found that BPA has the po...

Download PDF file
  • EP ID EP23108
  • DOI 10.22214/ijraset.2017.2019
  • Views 281
  • Downloads 7

How To Cite

Swati V. Mengje, Prof. R R Shelke (2017). Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 5(2), -. https://europub.co.uk/articles/-A-23108