Breast cancer diagnosis using feature extraction techniques with supervised and unsupervised classification algorithms
Journal Title: Applied Medical Informatics - Year 2019, Vol 41, Issue 1
Abstract
Background: Breast cancer is a serious disease that affects females around the globe. With the development of clinical technologies, different tumor features have been collected for breast cancer diagnosis. Filtering all the pertinent feature information to support the clinical disease diagnosis is a challenging and time-consuming task. The objective of this research was to diagnose breast cancer based on the extracted tumor features. The main contribution of our study is to use multivariate techniques such as principal component analysis, discriminant analysis and logistic regression for feature reduction combined with machine learning tools to classify and predict the tumor type. A hybrid DA-LR feature reduction is proposed, and models created with reduced features are tested by performing classification using Support Vector Machine, Naive Bayes, Decision Tree, Logistic Regression and Artificial Neural Network. Materials and Methods: Feature extraction and selection are critical to the quality of classifiers founded through data mining methods. To diagnose tumor through reduced features, a hybrid feature extraction is proposed. We tried to predict the disease based on relevant features in the data. The Breast Cancer Wisconsin Diagnostic Dataset obtained from the UCI Irvine Machine Learning Repository has been used in this study. After data pre-processing, the correlation matrix is generated that suggests the presence of multicollinearity. Feature reduction techniques including principal component analysis, discriminant analysis, and logistic regression are applied to extract features. Classification models namely Support vector machine, Naive Bayes, Decision Tree, Logistic Regression and Artificial Neural Network are created with extracted features, and their performance is compared. Result: The results not only illustrate the capability of the proposed approach on breast cancer diagnosis but also show time savings during the training phase. Physicians can also benefit from the mined abstract tumor features by better understanding the properties of different types of tumors. Conclusion: The Naive Bayes and Support Vector machine classification outperforms other classification methods and the model created with hybrid discriminant-logistic (DA-LR) feature selection performs best among all models.
Authors and Affiliations
Maryam SOLTANPOUR GHARIBDOUSTI, Syed HAIDER, Dieudonne OUEDRAOGO, Susan LU
Road Safety Related Behaviours of Romanian Young People
[i]Aim[/i]: The objective of this study was to assess the behaviors with risk for road traffic injuries among Romanian young people. [i]Material and Method[/i]: Self-administered questionnaires were completed by the stud...
Optimization of Blood Donation Activity Supporting a Smart City
The reported research aimed to decrease the distance between a person and the healthcare system regarding blood donation. The result is a web application developed for the healthcare system, in order to optimize blood do...
Stereotypes and Prejudices in HR Industry in Romania
In this paper we aimed to reveal the effects of the crisis in HR area, the stereotypes and prejudices clients have about Romanian HR companies, training programs and trainers and the ideal profile of a trainer. The effec...
A practical tool for assessment for evaluation of the information systems in medical laboratories in terms of quality assurance
The advent of information technology in the medical field has increased tremendously. Thisoccurs especially in the case of medical laboratories where, with the exception of microbiologyexamination, most of the examinatio...
Recording Evolution Supervised by a Genetic Algorithm for Quantitative Structure-Activity Relationship Optimization
A genetic algorithm for structure-activity relationships optimization was developed and implemented. The genetic algorithm was designed to be feed with families of molecular descriptors, and was tested on Molecular Descr...