A Novel Semantically-Time-Referrer based Approach of Web Usage Mining for Improved Sessionization in Pre-Processing of Web Log

Abstract

Web usage mining(WUM) , also known as Web Log Mining is the application of Data Mining techniques, which are applied on large volume of data to extract useful and interesting user behaviour patterns from web logs, in order to improve web based applications. This paper aims to improve the data discovery by mining the usage data from log files. In this paper the work is done in three phases. First and second phase0 which are data cleaning and user identification respectively are completed using traditional methods. The third phase, session identification is done using three different methods. The main focus of this paper is on sessionization of log file which is a critical step for extracting usage patterns. The proposed referrer-time and Semantically-time-referrer methods overcome the limitations of traditional methods. The main advantage of pre-processing model presented in this paper over other methods is that it can process text or excel log file of any format. The experiments are performed on three different log files which indicate that the proposed semantically-time-referrer based heuristic approach achieves better results than the traditional time and Referrer-time based methods. The proposed methods are not complex to use. Web log file is collected from different servers and contains the public information of visitors. In addition, this paper also discusses different types of web log formats.

Authors and Affiliations

Navjot Kaur, Himanshu Aggarwal

Keywords

Related Articles

Improving Accelerometer-Based Activity Recognition by Using Ensemble of Classifiers

In line with the increasing use of sensors and health application, there are huge efforts on processing of collected data to extract valuable information such as accelerometer data. This study will propose activity recog...

Improving Seek Time for Column Store Using MMH Algorithm 

 Hash based search has, proven excellence on large data warehouses stored in column store. Data distribution has significant impact on hash based search. To reduce impact of data distribution, we have proposed Memor...

Extracting Code Resource from OWL by Matching Method Signatures using UML Design Document

Software companies develop projects in various domains, but hardly archive the programs for future use. The method signatures are stored in the OWL and the source code components are stored in HDFS. The OWL minimizes the...

Mobile Software Testing: Thoughts, Strategies, Challenges, and Experimental Study

Mobile devices have become more pervasive in our daily lives, and are gradually replacing regular computers to perform traditional processes like Internet browsing, editing photos, playing videos and sound track, and rea...

Machine Learning based Predictive Model for Screening Mycobacterium Tuberculosis Transcriptional Regulatory Protein Inhibitors from High-Throughput Screening Dataset

In view of the essential role played by dosRS in the survival of Mycobacterium in the infected granuloma cells, dosRS transcriptional regulatory proteins were considered as a validated target for high throughput screenin...

Download PDF file
  • EP ID EP249761
  • DOI 10.14569/IJACSA.2017.080122
  • Views 89
  • Downloads 0

How To Cite

Navjot Kaur, Himanshu Aggarwal (2017). A Novel Semantically-Time-Referrer based Approach of Web Usage Mining for Improved Sessionization in Pre-Processing of Web Log. International Journal of Advanced Computer Science & Applications, 8(1), 158-168. https://europub.co.uk/articles/-A-249761