Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream

Abstract

As we know internet is the source of large number textual document those are created by users and distributed in various forms. Most of existing works are done on topic modelling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In this paper, in order to characterize and detect personalized and abnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviours. We present a group of algorithms to solve this innovative mining problem through three phases: preprocessing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Twitter is the best real time example, from that we able to discover the users abnormal behaviour. This approach gives the effective and efficient way to find out rare pattern in document string.

Authors and Affiliations

Swati V. Mengje, Prof. R R Shelke

Keywords

Related Articles

Efficient Deep Learning Approach for Dimensionality Reduction using Micro blogs from Big data

Nowadays Information Technology plays a vital role in every aspects of the human life. Now a world, the huge amount of stored information has been enormously increasing day by day which is generally in the unstructured...

Effect of Shear Wall and Bracing on Seismic Performance of Vertical Irregular Reinforced Concrete Buildings

the structural performance of buildings subjected to lateral forces due to earthquake excitation improved by lateral load resisting system in form bracing and shear wall system. Multi-storied structures are gaining wide...

slugRobust Data Integrity Mechanism for Outsourced Cloud Data

Cloud computing is used to store data remotely and access high quality applications and services from shared pool of configurable computing resources. The user can only be able to store their data. But, the user will n...

Machining Parameter Optimization for Cylindrical Part in CNC Turning Centre Using Mathematical Model and Genetic Algorithm Approach

This paper presents machining parameters (turning process) optimization based on the use of artificial intelligence. To obtain greater efficiency and productivity of the machine tool, optimal cutting parameters have to...

Case Study on Stabilized Black Cotton Soil Using Bauxite Residue and Fly Ash

Soil is not same in all places; it may be sandy, silt, gravelly and clayey soil which exhibits unique properties. Soil is not capable of taking same load; it varies with properties so it needs some modification which ca...

Download PDF file
  • EP ID EP23108
  • DOI 10.22214/ijraset.2017.2019
  • Views 270
  • Downloads 7

How To Cite

Swati V. Mengje, Prof. R R Shelke (2017). Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 5(2), -. https://europub.co.uk/articles/-A-23108