EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE

Abstract

Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasets that can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe a data set as large in size, perhaps, is circumstance dependent, thus, what constitutes a dataset to be considered as being big or small is vague. In this paper, the ability of the predictive model to generalize with respect to a particular size of data when simulated with new untrained input is examined. The study experiments on three different sizes of data using Matlab program to create predictive models with a view to establishing if the size of data has any effect on the accuracy of a model. The simulated output of each model is measured using the Mean Absolute Error (MAE) and comparisons are made. Findings from this study reveals that, the quantity of data partitioned for the purpose of training must be of good representation of the entire sets and sufficient enough to span through the input space. The results of simulating the three network models also shows that, the learning model with the largest size of training sets appears to be the most accurate and consistently delivers a much better and stable results.

Authors and Affiliations

A. R. Ajiboye, Abdullah Arshah, H. Qin

Keywords

Related Articles

METAMODELLING APPROACH AND SOFTWARE TOOLS FOR PHYSICAL MODELLING AND SIMULATION

In computer science, metamodelling approach becomes more and more popular for the purpose of software systems development. In this paper, we discuss applicability of the metamodelling approach for development of software...

A REVIEW OF SINGLE AND POPULATION-BASED METAHEURISTIC ALGORITHMS SOLVING MULTI DEPOT VEHICLE ROUTING PROBLEM

Multi-Depot Vehicle Routing Problem (MDVRP) arises with rapid development in the logistics and transportation field in recent years. This field, mainly, faces challenges in arranging their fleet efficiently to distribute...

INDONESIAN TEXT-TO-SPEECH SYSTEM USING DIPHONE CONCATENATIVE SYNTHESIS

In this paper, we describe the design and develop a database of Indonesian diphone synthesis using speech segment of recorded voice to be converted from text to speech and save it as audio file like WAV or MP3. In design...

MULTI-FACTOR ATTENDANCE AUTHENTICATION SYSTEM

Taking attendance in classes is a cumbersome task which can benefit from smartphone innovation. This study identifies the vulnerabilities of the technology and proposes a technique to identify cheating. Several smartphon...

GREENVEC GAME FOR SKIN CONDUCTIVITY LEVEL (SCL) BIOFEEDBACK PERFORMANCE SIMULATOR USING GALVANIC SKIN RESPONSE (GSR) SENSOR

The increasing fame of biofeedback game has brought convenience to human life. More and more people rely on biofeedback game as an alternative medical treatment to overcome their stress problems. GreenVec Biofeedback Gam...

Download PDF file
  • EP ID EP254080
  • DOI -
  • Views 154
  • Downloads 0

How To Cite

A. R. Ajiboye, Abdullah Arshah, H. Qin (2015). EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE. International Journal of Software Engineering and Computer Systems, 1(1), 75-84. https://europub.co.uk/articles/-A-254080