Breast Cancer Disease Prediction Using Random Forest Regression and Gradient Boosting Regression
Journal Title: International Journal of Experimental Research and Review - Year 2024, Vol 38, Issue 2
Abstract
The current research endeavors to evaluate the efficacy of regression-based machine learning algorithms through an assessment of their performance using diverse metrics. The focus of our study involves the implementation of the breast cancer Wisconsin (Diagnostic) dataset, employing both the random forest and gradient-boosting regression algorithms. In our comprehensive performance analysis, we utilized key metrics such as Mean Squared Error (MSE), R-squared, Mean Absolute Error (MAE), and Coefficient of Determination (COD), supplemented by additional metrics. The evaluation aimed to gauge the algorithms' accuracy and predictive capabilities. Notably, for continuous target variables, the gradient-boosting regression model emerged as particularly noteworthy in terms of performance when compared to other models. The gradient-boosting regression model exhibited remarkable results, highlighting its superiority in handling the breast cancer dataset. The model achieved an impressively low MSE value of 0.05, indicating minimal prediction errors. Furthermore, the R-squared value of 0.89 highlighted the model's ability to explain the variance in the data, affirming its robust predictive power. The Mean Absolute Error (MAE) of 0.14 reinforced the model's accuracy in predicting continuous outcomes. Beyond these core metrics, the study incorporated additional measures to provide a comprehensive understanding of the algorithms' performance. The findings underscore the potential of gradient-boosting regression in enhancing predictive accuracy for datasets with continuous target variables, particularly evident in the context of breast cancer diagnosis. This research contributes valuable insights to the ongoing exploration of machine learning algorithms, providing a basis for informed decision-making in medical and predictive analytics domains.
Authors and Affiliations
Pradeep Yadav, Chandra Prakash Bhargava, Deepak Gupta, Jyoti Kumari, Archana Acharya, Madhukar Dubey
Antidiabetic Potency of Flavonoids Using a Systematic Computer-Aided Drug Design Platform
Diabetic mellitus (DM) is a chronic metabolic disorder, with type 2 diabetes (T2DM) being the most prevalent type globally. Despite the availability of several target-specific drugs, the prevalence rate has remained unco...
Factors Determining Household Waste Segregation Behaviour: An Indian Case Study
Waste represents used things or materials that are no longer required or wanted. These articles are cast off as they have stopped working or because they have ceased to be of value. Human settlements inevitably generate...
TLBO-trained ANN-based Shunt Active Power Filter for Mitigation of Current Harmonics
The increased utilization of nonlinear devices is resulting in damage to power distribution infrastructure by introducing harmonics into power system networks, which in turn causes distortion in voltage and current signa...
In Silico Molecular Docking Analysis of Flavone and Phytol from Vilvam (Aegle marmelos) against Human Hepatocellular Carcinoma (HepG-2) Mitochondrial Proteins
The vilvam fruit is an important source of phyto compounds, that are a good natural resource for curing several health illnesses. Annually, around 906,000 new cases and 830,000 deaths worldwide are attributed to liver ca...
Prevalence of Stunting, wasting and underweight among Santal children of Galudih, Purbi Singbhum district, Jharkhand, India
The objective of this study was to assess the differences in body stature (height), body weight, and frequency of stunted, wasted, and underweight children of the Santal ethnicity in Galudih area, Purbi Singbhum, Jharkha...