Study on Efficient Way to Identify User Aware Rare Sequential Pattern Matching in Document Stream
Journal Title: International Journal for Research in Applied Science and Engineering Technology (IJRASET) - Year 2017, Vol 5, Issue 2
Abstract
As we know internet is the source of large number textual document those are created by users and distributed in various forms. Most of existing works are done on topic modelling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In this paper, in order to characterize and detect personalized and abnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviours. We present a group of algorithms to solve this innovative mining problem through three phases: preprocessing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Twitter is the best real time example, from that we able to discover the users abnormal behaviour. This approach gives the effective and efficient way to find out rare pattern in document string.
Authors and Affiliations
Swati V. Mengje, Prof. R R Shelke
slugIdentification of Suitable Site for Possible Ground Water Recharge in South-West District of Delhi
India is endowed with a rich and vast diversity of natural resources, water being one of them. The Average Annual Runoff available in India is 1869 BCM (Billion Cubic Meter) of which only 1123 BCM is estimated as utiliz...
Solar Powered Underground Cable Fault Distance Locator Over IOT
the objective of this project is to determine the distance of underground cable fault distance from the base station in kilometers and displayed over the internet and to the connected computer .the device is powered by...
High Speed Low Power Self Timed CAM Application Based On Reordered Overlapped Search Method
This paper introduces a reordered overlapped search mechanism for high-throughput low-energy contentaddressable memories (CAMs). Most mismatches can be found by searching a few bits of a search word. To lower power diss...
Video steganography -Information Security Perspective
Security in digital multimedia content is inevitable in the era of Internet .Unauthorized person should not be able alter or receive the message .To address this issue many security mechanisms have evolved like encrypti...
A Survey on ZIGBEE Wireless Networks
ZigBee is an emerging worldwide standard for wireless personal area network. The main aim is to provide lowpower, cost effective, flexible, reliable, secure and scalable. It is totally different from the other personal a...