Model Selection in Regression: Application to Tumours in Childhood
Journal Title: Current Trends on Biostatistics & Biometrics - Year 2018, Vol 1, Issue 1
Abstract
We give a chronological review of the major model selection methods that have been proposed from circa 1960. These model selection procedures include Residual mean square error (MSE), coefficient of multiple determination (R2), adjusted coefficient of multiple determination (Adj R2), Estimate of Error Variance (S2), Stepwise methods, Mallow’s Cp, Akaike information criterion (AIC), Schwarz criterion (BIC). Some of these methods are applied to a problem of developing a model for predicting tumors in childhood using log-linear models. The theoretical review will discuss the problem of model selection in a general setting. The application will be applied to log-linear models in particular. The problem of model selection is at the core of progress in science. Over the decades, scientists have used various statistical tools to select among alternative models of data. A common challenge for the scientist is the selection of the best subset of predictor variables in terms of some specified criterion. Tobias Meyer (1750) established the two main methods, namely fitting linear estimation and Bayesian analysis by fitting models to observation. The 1900 to 1930’s saw a great development of regression and statistical ideas but were based on hand calculations. In 1951 Kullback and Leibler developed a measure of discrepancy from Information Theory, which forms the theoretical basis for criteria-based model selection. In the 1960’s computers enabled scientists to address the problem of model selection. Computer programmes were developed to compute all possible subsets for an example, Stepwise regression, Mallows Cp, AIC, TIC and BIC. During the 1970’s and 1980’s there was huge spate of proposals to deal with the model selection problem. Linhart and Zucchini (1986) provided a systematic development of frequentist criteria-based model selection methods for a variety of typical situations that arise in practice. These included the selection of univariate probability distributions, the regression setting, the analysis of variance and covariance, the analysis of contingency tables, and time series analysis. Bozdogan [1] gives an outstanding review to prove how AIC may be applied to compare models in a set of competing models and define a statistical model as a mathematical formulation that expresses the main features of the data in terms of probabilities. In the 1990’s Hastie and Tibsharini introduced generalized additive models. These models assume that the mean of the dependent variable depends on an additive predictor through a nonlinear link function. Generalized additive models permit the response probability distribution to be any member of the exponential family of distributions. They particularly suggested that, up to that date, model selection had largely been a theoretical exercise and those more practical examples were needed (see Hastie and Tibshirani, 1990).
Authors and Affiliations
Annah Managa
Orthogonal Arrays and Row-Column and Block Designs for CDC Systems
In this article, block and row-column designs for genetic crosses such as Complete diallel cross system using orthogonal arrays (p2, r, p, 2), where p is prime or a power of prime and semi balanced arrays (p(p-1)/2, p, p...
Contraceptive Efficacy a Retrospective Analysis Among Nigerian
Background: This study examined contraceptive use patterns and method selection among women of reproductive age in Nigeria, with a particular focus on the extent to which demographic and socio - economic characteristics...
The Gompertz Length Biased Exponential Distribution and its application to Uncensored Data
This paper proposes a generalization of the length biased exponential distribution, called the Gompertz length biased exponential (GLBE) distribution. Some of the basic properties of the proposed model were derived in mi...
A Simple Mathematical Model for a New Type of Cancer Cells
Recently new type of cancer cells has been observed. It is called Hybrid cells. A simple mathematical model is proposed to describe them. It implies that they will be near the tumor surface or circulating. Some comments...
Modeling Lifetime Data with the Odd Generalized Exponentiated Inverse Lomax Distribution
We propose a four parameter compound continuous distribution in this study. Simulation studies was carried out to investigate the behavior of the proposed distribution, from which the maximum likelihood estimates for the...