Data Editing for Semi-Supervised Co-Forest by the Local Cut Edge Weight Statistic Graph (CEWS-Co-Forest)

Journal Title: Transactions on Machine Learning and Artificial Intelligence - Year 2017, Vol 5, Issue 4

Abstract

In order to address the large amount of unlabeled training data problem, many semisupervised algorithms have been proposed. The training data in semisupervised learning may contain much noise due to the insufficient number of labeled data in training set. Such noise may snowball themselves in the following learning process and thus hurt the generalization ability of the final hypothesis. If such noise could be identified and removed by some strategy, the performance of the semisupervised algorithms should be improved. However, such useful techniques of identifying and removing noise have been seldom explored in existing semisupervised algorithms. In this paper, we use the semisupervised ensemble method “Coforest” with data editing (we call it CEWSCoforest) to improve sparsely labeled medical dataset. The cut edges weight statistic data editing technique is used to actively identify possibly mislabeled examples in the newlylabeled data throughout the colabeling iterations in Coforest. The fusion of semisupervised ensemble method with data editing makes CEWScoForest more robust to the sparsity and the distribution bias of the training data. It further simplifies the design of semisupervised learning which makes CEWScoforest more efficient. An experimental study on several medical data sets shows encouraging results compared with stateoftheart methods.

Authors and Affiliations

Nesma Settouti, Mohammed El Amine Bechar, Mostafa EL Habib Daho, Mohammed Amine Chikh

Keywords

Related Articles

Theory Of Dynamic Interactions: Synthesis

In this text, we carry out a brief summary of the Theory of Dynamic Interactions developed by the author in the new book: New Paradigm in Physics. Certain keys are provided to better understand the dynamic hypotheses pro...

Nonlinear Time Series Prediction Performance Using Constrained Motion Particle Swarm Optimization

Constrained Motion Particle Swarm Optimization (CMPSO) is a general framework for optimizing Support Vector Regression (SVR) free parameters for nonlinear time series regression and prediction. CMPSO uses Particle Swarm...

Difficulty-Level Classification for English Writings

The popularity of e-books has grown recently. As the number of e-books continues to increase, the task of categorizing all books manually requires a significant amount of time. If English sentences can be categorized acc...

A Dynamic Clustering Approach for Maximizing Scalability in Wireless Sensor Networks

Scalability is an important and crucial issue which in routing protocols for Wireless Sensor Networks (WSNs). In this paper, we present an approach to achieving a balanced energy consumption rate using dynamic clustering...

Development of SLA Monitoring Tools Based on Proposed DMI in Cloud Computing

Service level agreement (SLA) is a contract between service provider and user about the quality of service (QoS) in cloud computing. The cost value and benefit value of SLA monitoring systems is a concerned issue in clou...

Download PDF file
  • EP ID EP310033
  • DOI 10.14738/tmlai.54.3290
  • Views 68
  • Downloads 0

How To Cite

Nesma Settouti, Mohammed El Amine Bechar, Mostafa EL Habib Daho, Mohammed Amine Chikh (2017). Data Editing for Semi-Supervised Co-Forest by the Local Cut Edge Weight Statistic Graph (CEWS-Co-Forest). Transactions on Machine Learning and Artificial Intelligence, 5(4), 540-546. https://europub.co.uk/articles/-A-310033