EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE

Abstract

Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasets that can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe a data set as large in size, perhaps, is circumstance dependent, thus, what constitutes a dataset to be considered as being big or small is vague. In this paper, the ability of the predictive model to generalize with respect to a particular size of data when simulated with new untrained input is examined. The study experiments on three different sizes of data using Matlab program to create predictive models with a view to establishing if the size of data has any effect on the accuracy of a model. The simulated output of each model is measured using the Mean Absolute Error (MAE) and comparisons are made. Findings from this study reveals that, the quantity of data partitioned for the purpose of training must be of good representation of the entire sets and sufficient enough to span through the input space. The results of simulating the three network models also shows that, the learning model with the largest size of training sets appears to be the most accurate and consistently delivers a much better and stable results.

Authors and Affiliations

A. R. Ajiboye, Abdullah Arshah, H. Qin

Keywords

Related Articles

LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN

Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependen...

EFFECTS OF VIDEO DISPLAY TERMINAL RESOLUTIONS TO THE LEGIBILITY OF TEXT ON A WEB PAGE

Higher Video Display Terminal (VDT) resolutions have been proven to provide better quality in improving image quality displayed. The higher the resolution means more pixels per-inch-square available to display an image....

DATA SECURITY ISSUES IN CLOUD COMPUTING: REVIEW

Cloud computing is an internet based model that empower on demand ease of access and pay for the usage of each access to shared pool of networks. It is yet another innovation that fulfills a client's necessity for comput...

THE DAWN OF METAHEURISTIC ALGORITHMS

Optimization has become such a favored area of research in recent times necessitating the need for technical papers and tutorials that will properly analyze and explain the basics of the field. At the heart of efficiency...

AN EVALUATION OF IMPROVED CLUSTER-BASED ROUTING PROTOCOL IN AD-HOC WIRELESS NETWORK

In this paper we presents a performance comparison of Dynamic Source Routing (DSR), Ad hoc On Demand Vector (AODV), Cluster Based Routing Protocol (CBRP) and Improved Cluster Based Routing Protocol (i-CBRP) routing proto...

Download PDF file
  • EP ID EP254080
  • DOI -
  • Views 154
  • Downloads 0

How To Cite

A. R. Ajiboye, Abdullah Arshah, H. Qin (2015). EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE. International Journal of Software Engineering and Computer Systems, 1(1), 75-84. https://europub.co.uk/articles/-A-254080