Impact of Data Preprocessing Techniques on the Performance of Machine Learning Models for Drought Prediction

Apply

Impact of Data Preprocessing Techniques on the Performance of Machine Learning Models for Drought Prediction

Journal Title: Acadlore Transactions on AI and Machine Learning - Year 2025, Vol 4, Issue 1

Abstract

Drought, a complex natural phenomenon with profound global impacts, including the depletion of water resources, reduced agricultural productivity, and ecological disruption, has become a critical challenge in the context of climate change. Effective drought prediction models are essential for mitigating these adverse effects. This study investigates the contribution of various data preprocessing steps—specifically class imbalance handling and dimensionality reduction techniques—to the performance of machine learning models for drought prediction. Synthetic Minority Over-sampling Technique (SMOTE) and near miss sampling methods were employed to address class imbalances within the dataset. Additionally, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were applied for dimensionality reduction, aiming to improve computational efficiency while retaining essential features. Decision tree algorithms were trained on the preprocessed data to assess the impact of these preprocessing techniques on model accuracy, precision, recall, and F1-score. The results indicate that the SMOTE-based sampling approach significantly enhances the overall performance of the drought prediction model, particularly in terms of accuracy and robustness. Furthermore, the combination of SMOTE, PCA, and LDA demonstrates a substantial improvement in model reliability and generalizability. These findings underscore the critical importance of carefully selecting and applying appropriate data preprocessing techniques to address class imbalances and reduce feature space, thus optimizing the performance of machine learning models in drought prediction. This study highlights the potential of preprocessing strategies in improving the predictive capabilities of models, providing valuable insights for future research in climate-related prediction tasks.

Authors and Affiliations

Serap Erçel, Sinem Akyol

Keywords

Machine learning; Drought prediction; SMOTE; Near miss; PCA; LDA; Decision trees

Augmenting Diabetic Retinopathy Severity Prediction with a Dual-Level Deep Learning Approach Utilizing Customized MobileNet Feature Embeddings

Diabetic retinopathy, a severe ocular disease correlated with elevated blood glucose levels in diabetic patients, carries a significant risk of visual impairment. The essentiality of its timely and precise severity class...

Comparative Analysis of Mortality Predictions from Lassa Fever in Nigeria: A Study Using Count Regression and Machine Learning Methods

In Sub-Saharan Africa, particularly in Nigeria, Lassa fever poses a significant infectious disease threat. This investigation employed count regression and machine learning techniques to model mortality rates associated...

EP ID EP767820
DOI https://doi.org/10.56578/ataiml040102
Views 11
Downloads 0