An Investigation on Topic Maps Based Document Classification with Unbalance Classes
Journal Title: Journal of Independent Studies and Research - Computing - Year 2015, Vol 13, Issue 1
Abstract
Classification of imbalanced data has become a widespread problem due to the fact that the most real world datasets are imbalanced. In a classification task, one of the challenges is to learn the feature-space of classification under class-imbalance setting. The majority classes generally have good representation of features in the learned classification function and the minority classes lack this representation; subsequently, the classification for these classes failed more often. In this paper, authors investigate the task of document classification with topic map based representation of documents under class imbalance setting. In order to measure of topic-map based representation for classification under imbalance data, authors compare three representations: Bag-ofWords, Phrases and Topic terms for three approaches (i) under-sampling, (ii) cost-adjusting, and (iii) cluster based sampling. A series of experiments are carried out and results are reported.
Implementation of Discrete Fourier Transform and Orthogonal Discrete Wavelet Transform in Python
This paper presents implementation of Discrete Fourier Transform and Orthogonal Discrete Wavelet Transform in Python computer programming language. The Fourier Transform is a fundamental signal processing tool whereas th...
Comparative Analysis of Collaborative Filtering on GraphLab, MLlib and Mahout
Recommendation systems are used to recommend items or products to the user based on their previous purchases, visits, interests, ratings, wish-lists or reviews to develop interest and to display the accurate and suitable...
Performance Analysis of TCP and UDP over Mobile Ad hoc Network
In the network computing domain, mobile ad hoc networks (MANETs) have gained promi- nence during the recent years. These networks have been used in almost all domains of today’s life; especially in military and emergency...
Prediction of Suicide Causes in India using Machine Learning
Worldwide, suicide rate is considered one of the most significant issue. With each passing year, the number of suicide is getting increased phenomenally and because of this reason, this research is carried out to predict...
Detection of Duplicate and Near-Duplicate Content for Web Crawlers
There is an abundance of duplicated web documents on the internet. For example, two documents online could be very similar to each other except for a very small portion, such as URLs and advertisements. While such differ...