Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams

Abstract

Documents created and distributed on the Internet are ever changing in various forms. Most of existing works are devoted to topic modeling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In order to characterize and detect personalized and abnormal behaviors of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining Useraware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. Here present solutions to solve this innovative mining problem through three phases: pre-processing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by patterngrowth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that our approach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantly reflect users' characteristics. Swati V. Mengje | Prof. Rajeshri R. Shelke"Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd101.pdf http://www.ijtsrd.com/engineering/computer-engineering/101/efficient-way-to-identify-user-aware-rare-sequential-patterns-in-document-streams/swati-v-mengje

Authors and Affiliations

Keywords

Related Articles

Design and Performance Analysis of Proposed Single-Sided Linear Induction Motor used in Elevator

In this paper, single-sided linear induction motor SLIM for driving the elevator system is designed. Differing from other motors, SLIM is simple in construction, less expensive, very suitable for linear application which...

Modeling of Biodiesel Fuelled Direct Injection CI Engine using CFD

In present years, biodiesel is used as a fuel in internal combustion engine. This study deals with the combustion modeling of CI engine fuelled with Karanja oil as biodiesel. Combustion characteristics like heat release...

Magnetic Properties of Titanium Substituted Ni - Zn Ferrites

The polycrystalline samples of mixed ferrites Ni0.5 x Zn0.5TixFe2 2xO4 with x = 0.0, 0.1, 0.2, 0.3, 0.4 and 0.5 were prepared by ceramic method. Single phase formation is confirmed by x ray diffraction. It is seen that,...

Anita Nair's "Ladies Coupe" The Patriarchal Set Up of the Indian Family Life

The paper showcases the theme of the patriarchal set up of the Indian family life in Anita Nair's “Ladies Coupe”. Anita Nair's “Ladies Coupe” is a great success in literature world.As a woman writer, she goes dee...

A Review on HVDC Circuit Breakers

he continuously increasing demand for electric control and the financial access to remote sustainable power source sources, for example, seaward wind control or sunlight based warm age in deserts have restored the enthus...

Download PDF file
  • EP ID EP357279
  • DOI -
  • Views 207
  • Downloads 0

How To Cite

(2017). Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams. International Journal of Trend in Scientific Research and Development, 1(4), -. https://europub.co.uk/articles/-A-357279