Model Selection in Regression: Application to Tumours in Childhood

Journal Title: Current Trends on Biostatistics & Biometrics - Year 2018, Vol 1, Issue 1

Abstract

We give a chronological review of the major model selection methods that have been proposed from circa 1960. These model selection procedures include Residual mean square error (MSE), coefficient of multiple determination (R2), adjusted coefficient of multiple determination (Adj R2), Estimate of Error Variance (S2), Stepwise methods, Mallow’s Cp, Akaike information criterion (AIC), Schwarz criterion (BIC). Some of these methods are applied to a problem of developing a model for predicting tumors in childhood using log-linear models. The theoretical review will discuss the problem of model selection in a general setting. The application will be applied to log-linear models in particular. The problem of model selection is at the core of progress in science. Over the decades, scientists have used various statistical tools to select among alternative models of data. A common challenge for the scientist is the selection of the best subset of predictor variables in terms of some specified criterion. Tobias Meyer (1750) established the two main methods, namely fitting linear estimation and Bayesian analysis by fitting models to observation. The 1900 to 1930’s saw a great development of regression and statistical ideas but were based on hand calculations. In 1951 Kullback and Leibler developed a measure of discrepancy from Information Theory, which forms the theoretical basis for criteria-based model selection. In the 1960’s computers enabled scientists to address the problem of model selection. Computer programmes were developed to compute all possible subsets for an example, Stepwise regression, Mallows Cp, AIC, TIC and BIC. During the 1970’s and 1980’s there was huge spate of proposals to deal with the model selection problem. Linhart and Zucchini (1986) provided a systematic development of frequentist criteria-based model selection methods for a variety of typical situations that arise in practice. These included the selection of univariate probability distributions, the regression setting, the analysis of variance and covariance, the analysis of contingency tables, and time series analysis. Bozdogan [1] gives an outstanding review to prove how AIC may be applied to compare models in a set of competing models and define a statistical model as a mathematical formulation that expresses the main features of the data in terms of probabilities. In the 1990’s Hastie and Tibsharini introduced generalized additive models. These models assume that the mean of the dependent variable depends on an additive predictor through a nonlinear link function. Generalized additive models permit the response probability distribution to be any member of the exponential family of distributions. They particularly suggested that, up to that date, model selection had largely been a theoretical exercise and those more practical examples were needed (see Hastie and Tibshirani, 1990).

Authors and Affiliations

Annah Managa

Keywords

Related Articles

Contraceptive Efficacy a Retrospective Analysis Among Nigerian

Background: This study examined contraceptive use patterns and method selection among women of reproductive age in Nigeria, with a particular focus on the extent to which demographic and socio - economic characteristics...

The Gompertz Length Biased Exponential Distribution and its application to Uncensored Data

This paper proposes a generalization of the length biased exponential distribution, called the Gompertz length biased exponential (GLBE) distribution. Some of the basic properties of the proposed model were derived in mi...

Modeling Lifetime Data with the Odd Generalized Exponentiated Inverse Lomax Distribution

We propose a four parameter compound continuous distribution in this study. Simulation studies was carried out to investigate the behavior of the proposed distribution, from which the maximum likelihood estimates for the...

Phenotypic Correlation Between Egg Weight and Egg Linear Measurements of the French Broiler Guinea Fowl Raised in the Humid Zone of Nigeria

This study was carried out in Funtua, Kastina State. A total of 119 Eggs of the French broiler guinea fowl were sourced at Songhai Agricultural center Funtua, Kastina State. The eggs were measured for egg linear measurem...

On some Derivatives of Vector-Matrix Products Useful for Statistics

In this brief description, we will use the numerator layout [1], and will tacitly assume that all products are conformable. and since 𝒰t𝒱 is a scalar, we are facing a particular case of the derivative of a scalar λ with...

Download PDF file
  • EP ID EP640187
  • DOI 10.32474/CTBB.2018.01.000101
  • Views 19
  • Downloads 0

How To Cite

Annah Managa (2018). Model Selection in Regression: Application to Tumours in Childhood. Current Trends on Biostatistics & Biometrics, 1(1), 1-12. https://europub.co.uk/articles/-A-640187