EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE

Abstract

Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasets that can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe a data set as large in size, perhaps, is circumstance dependent, thus, what constitutes a dataset to be considered as being big or small is vague. In this paper, the ability of the predictive model to generalize with respect to a particular size of data when simulated with new untrained input is examined. The study experiments on three different sizes of data using Matlab program to create predictive models with a view to establishing if the size of data has any effect on the accuracy of a model. The simulated output of each model is measured using the Mean Absolute Error (MAE) and comparisons are made. Findings from this study reveals that, the quantity of data partitioned for the purpose of training must be of good representation of the entire sets and sufficient enough to span through the input space. The results of simulating the three network models also shows that, the learning model with the largest size of training sets appears to be the most accurate and consistently delivers a much better and stable results.

Authors and Affiliations

A. R. Ajiboye, Abdullah Arshah, H. Qin

Keywords

Related Articles

PARAMETER-LESS SIMULATED KALMAN FILTER

Simulated Kalman Filter (SKF) algorithm is a new population-based metaheuristic optimization algorithm. In the original SKF algorithm, three parameter values are assigned during initialization, the initial error covarian...

CATEGORIZATION OF GELAM, ACACIA AND TUALANG HONEY ODORPROFILE USING K-NEAREST NEIGHBORS

Honey authenticity refer to honey types is of great importance issue and interest in agriculture. In current research, several documents of specific types of honey have their own usage in medical field. However, it is qu...

IDENTIFICATION AND QUANTIFICATION OF FACTORS AFFECTING REUSABILITY OF OPEN SOURCE SOFTWARE IN REUSE-INTENSIVE SOFTWARE DEVELOPMENT

Open Source Software (OSS) is one of the emerging areas in software engineering. Reuse of OSS is employed in reuse-intensive software development such as Component Based Software Development and Software Product Lines. O...

AN APPROACH TO INCREASE THE EFFECTIVENESS OF TLC VERIFICATION WITH RESPECT TO THE CONCURRENT STRUCTURE OF TLA+ SPECIFICATION

Modern approaches to distributed software systems engineering are tightly bounded with formal methods usage. The effective way of certain method application can leverage significant outcome, in terms of corresponding tim...

REVERSIBLE WATERMARKING BASED ON SORTING PREDICTION ALGORITHM

Reversible watermarking has drawn a lot of interest in recent years. Sachnev et al proposed reversible watermarking algorithm by combining prediction technology, histogram shifting technology and sorting technology, whic...

Download PDF file
  • EP ID EP254080
  • DOI -
  • Views 153
  • Downloads 0

How To Cite

A. R. Ajiboye, Abdullah Arshah, H. Qin (2015). EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE. International Journal of Software Engineering and Computer Systems, 1(1), 75-84. https://europub.co.uk/articles/-A-254080