A Novel Data Handling Technique for Wine Quality Analysis using ML Techniques
Journal Title: International Journal of Experimental Research and Review - Year 2024, Vol 45, Issue 9
Abstract
In this era, wine is a regularly redeemed beverage, and industries are seeing increased sales due to product quality certification. This research aims to identify key wine characteristics that contribute to significant outcomes through the application of machine learning classification techniques, specifically Random Forest (RF), Decision Tree (DT) and Multi-Layer Perceptron (MLP), using white and red wine datasets sourced from the UCI Machine Learning repository. This research aims to develop a multiclass classification model using machine learning (ML) to accurately assess the quality of a balanced wine dataset comprising both white and red wines. The dataset is balanced by random oversampling to avoid biases in ML techniques for the majority class obtained by the imbalanced multiclass dataset (IMD). Furthermore, we apply a Yeo-Jhonson transformation (YJT) to the datasets to reduce skewness. We validated the ML algorithm's result using a 10-fold cross-validation approach and found that RF yielded the highest overall accuracy of 93.14%, within a range of 75% to 94%. We have observed that the proposed approach for balanced white wine dataset accuracy is 93.14% using RF, 90.83% using DT, and 75.49% using MLP. Similarly, for the balanced red wine dataset, accuracy is 89.36% using RF, 85.36% using DT, and 78.00% using MLP. The proposed approach improves accuracy by RF 23%, DT 30%, and MLP 21% for the white wine dataset. Similarly, accuracy by RF remained the same, DT 10%, and MLP 22% is improved in the red wine dataset. Additionally, the proposed approach's RF, DT, and MLP yield mean squared error (MSE) values of 0.080, 0.151, and 0.443 for the white wine dataset and 0.143, 0.221, and 0.396 for the red wine dataset. We also observed that the RF accuracy for the proposed technique is the highest among all specified classifiers for white and red wine datasets, respectively.
Authors and Affiliations
Onima Tigga, Jaya Pal, Debjani Mustafi
An alarming public health concern over variability in herbal compositions of marketed immunity booster products during COVID-19: A botanical survey-based study
The world is going through pandemic of the century named COVID-19 disease. The COVID-19 pathogenesis involves cytokine storm in advanced stage leading to systemic hyper-inflammation. Medicinal herbs are practiced as part...
A Cross-Sectional Study to Analyze the Physical and Cognitive Fatigue Due to Sleep Disruption Among Shift Workers in Tamilnadu
The objective of this research is to analyse the extent and manner of the kind of fatigue among shift workers in Tamil Nadu, India. As for shift workers, they often have disturbed night’s sleep. Shift work is distinguish...
Densitometric HPTLC analysis of the Acacia catechu wild fractions for phenolics
Traditional "Ayurvedic" medicine from India has traditionally used Acacia catechu. The herbal extract is the primary component, although there have been no attempts to standardize it as an active agent or marker. A chrom...
Multiband Elliptical Patch Octagon Antenna With And Without Proximity Coupling
This paper presents a novel multiple-band elliptical patch octagonal antenna with and without proximity coupling. The frequency bandwidth and the requirement for high data throughput are always on the rise with today’swi...
MCIP: Mining Crop Image Data On pysparkdataframe Using Feature Selection and Cluster Based Techniques
Crop-related problems such as pests and diseases in India lead to yearly losses exceeding $500 billion. Leaf blight is identified as the principal factor responsible for the substantial financial losses amounting to $500...