Optimal Tree Depth in Decision Tree Classifiers for Predicting Heart Failure Mortality
Journal Title: Healthcraft Frontiers - Year 2023, Vol 1, Issue 1
Abstract
The depth of a decision tree (DT) affects the performance of a DT classifier in predicting mortality caused by heart failure (HF). A deeper tree learns complex patterns within the data, theoretically leading to better predictive performance. A very deep tree also leads to overfitting, because the model learns the training data rather than generalize to new and unseen data, resulting in a lower classification performance on test data. Similarly, a shallow tree does not learn much of the complexity within the data, leading to underfitting and a lower performance. The pruning method has been proposed to set a limit on the maximum tree depth or the minimum number of instances required to split a node to reduce the computational complexity. Pruning helps avoid overfitting. However, it does not help find the optimal depth of the tree. To build a better-performing DT classifier, it is crucial to find the optimal tree depth to achieve optimal performance. This study proposed cross-validation to find the optimal tree depth using validation data. In the proposed method, the cross-validated accuracy for training and test data is empirically tested using the HF dataset, which contains 299 observations with 11 features collected from the Kaggle machine learning (ML) data repository. The observed result reveals that tuning the DT depth is significantly important to balance the learning process of the DT because relevant patterns are captured and overfitting is avoided. Although cross-validation techniques prove to be effective in determining the optimal DT depth, this study does not compare different methods to determine the optimal depth, such as grid search, pruning algorithms, or information criteria. This is the limitation of this study.
Authors and Affiliations
Tsehay Admassu Assegie, Ahmed Elaraby
Impact of Maternal Health Education on Pediatric Oral Health in Banda Aceh: A Quasi-Experimental Study
In Banda Aceh City, Indonesia, particularly in Punge Jurong Gampong, the effectiveness of child oral health service interventions is notably impacted by the level of maternal knowledge and involvement. This quasi-experim...
NC2C-TransCycleGAN: Non-Contrast to Contrast-Enhanced CT Image Synthesis Using Transformer CycleGAN
Background: Lung cancer poses a great threat to human life and health. Although the density differences between lesions and normal tissues shown on enhanced CT images is very helpful for doctors to characterize and d...
A CNN Approach for Enhanced Epileptic Seizure Detection Through EEG Analysis
Epilepsy, the most prevalent neurological disorder, is marked by spontaneous, recurrent seizures due to widespread neuronal discharges in the brain. This condition afflicts approximately 1% of the global population, with...
Unlocking Minds: An Adaptive Machine Learning Approach for Early Detection of Depression
Depression, a prevalent and severe medical condition, significantly impairs emotional well-being, cognitive functions, and behavior, often leading to substantial challenges in daily functioning and, in severe cases,...
A Comparative Analysis of Side Effects from the Third Dose of COVID-19 Vaccines in Palestine and Jordan
In this cross-sectional study, the prevalence and characteristics of adverse effects following the administration of the third dose of the coronavirus disease 2019 (COVID-19) vaccines were compared between recipients i...