Breast cancer diagnosis using feature extraction techniques with supervised and unsupervised classification algorithms

Journal Title: Applied Medical Informatics - Year 2019, Vol 41, Issue 1

Abstract

Background: Breast cancer is a serious disease that affects females around the globe. With the development of clinical technologies, different tumor features have been collected for breast cancer diagnosis. Filtering all the pertinent feature information to support the clinical disease diagnosis is a challenging and time-consuming task. The objective of this research was to diagnose breast cancer based on the extracted tumor features. The main contribution of our study is to use multivariate techniques such as principal component analysis, discriminant analysis and logistic regression for feature reduction combined with machine learning tools to classify and predict the tumor type. A hybrid DA-LR feature reduction is proposed, and models created with reduced features are tested by performing classification using Support Vector Machine, Naive Bayes, Decision Tree, Logistic Regression and Artificial Neural Network. Materials and Methods: Feature extraction and selection are critical to the quality of classifiers founded through data mining methods. To diagnose tumor through reduced features, a hybrid feature extraction is proposed. We tried to predict the disease based on relevant features in the data. The Breast Cancer Wisconsin Diagnostic Dataset obtained from the UCI Irvine Machine Learning Repository has been used in this study. After data pre-processing, the correlation matrix is generated that suggests the presence of multicollinearity. Feature reduction techniques including principal component analysis, discriminant analysis, and logistic regression are applied to extract features. Classification models namely Support vector machine, Naive Bayes, Decision Tree, Logistic Regression and Artificial Neural Network are created with extracted features, and their performance is compared. Result: The results not only illustrate the capability of the proposed approach on breast cancer diagnosis but also show time savings during the training phase. Physicians can also benefit from the mined abstract tumor features by better understanding the properties of different types of tumors. Conclusion: The Naive Bayes and Support Vector machine classification outperforms other classification methods and the model created with hybrid discriminant-logistic (DA-LR) feature selection performs best among all models.

Authors and Affiliations

Maryam SOLTANPOUR GHARIBDOUSTI, Syed HAIDER, Dieudonne OUEDRAOGO, Susan LU

Keywords

Related Articles

Medical Informatics and Statistics in an Undergraduate Nursing Curriculum: Survey of Students’ Perception

[i]Aim[/i]: A survey was conducted in undergraduate medical students enrolled in 3 or 4-year degree programs in Nursing, Midwifery, Radiology and Medical Imaging (technician), Physiotherapy and Kinetotherapy (technician)...

Does the Use of Ovulation Monitors Really Increase Pregnancy Rates? Some Things Women Should Know

Ovulation monitors are widely used by women wishing to achieve pregnancy. However, there are few data substantiating claims that these devices enhance the probability of becoming pregnant. In one report it is concluded f...

Automated entropy-based detection of mispronounced logatomes

This paper presents a controlled experiment focused on the entropy-based discrimination of mispronunciations of logatomes (monosyllabic pseudowords). The introductory part briefly describes the related symptomology and t...

Tobacco Smoking Among School Personnel in Romania, Teaching Practices and Resources Regarding Tobacco Use Prevention

The study was conducted to collect baseline information on tobacco use, knowledge and attitudes of school personnel toward tobacco, to evaluate the existence and effectiveness of tobacco control policies in schools, and...

Sources of information on medicines: a comparation between Romania and other European countries

Doctors are required to discriminate between different types of online information sources. The official source of information on medicines in Romania is the Nomenclatorul medicamentelor published online by the National...

Download PDF file
  • EP ID EP655031
  • DOI -
  • Views 64
  • Downloads 0

How To Cite

Maryam SOLTANPOUR GHARIBDOUSTI, Syed HAIDER, Dieudonne OUEDRAOGO, Susan LU (2019). Breast cancer diagnosis using feature extraction techniques with supervised and unsupervised classification algorithms. Applied Medical Informatics, 41(1), 40-52. https://europub.co.uk/articles/-A-655031