Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models

Abstract

 This paper shows how to use labeled and unlabeled data to improve inductive models with the help of transductivemodels.We proposed a solution for the self-training scenario. Self- training is an effective semi-supervised wrapper method which can generalize any type of supervised inductive model to the semi-supervised settings. it iteratively refines a inductive model by bootstrap from unlabeled data. Standard self-training uses the classifier model(trained on labeled examples) to label and select candidates from the unlabeled training set, which may be problematic since the initial classifier may not be able to provide highly confident predictions as labeled training data is always rare. As a result, it could always suffer from introducing too much wrongly labeled candidates to the labeled training set, which may severely degrades performance. To tackle this problem, we propose a novel self-training style algorithm which incorporate a graph-based transductive model in the self-labeling process. Unlike standard self-training, our algorithm utilizes labeled and unlabeled data as a whole to label and select unlabeled examples for training set augmentation. A robust transductive model based on graph markov random walk is proposed, which exploits manifold assumption to output reliable predictions on unlabeled data using noisy labeled examples. The proposed algorithm can greatly minimize the risk of performance degradation due to accumulated noise in the training set. Experiments show that the proposed algorithm can effectively utilize unlabeled data to improve classification performance.

Authors and Affiliations

ShengJun Cheng, Jiafeng Liu, XiangLong Tang

Keywords

Related Articles

 FISHER DISTANCE BASED GA CLUSTERING TAKING INTO ACCOUNT OVERLAPPED SPACE AMONG PROBABILITY DENSITY FUNCTIONS OF CLUSTERS IN FEATURE SPACE

Fisher distance based Genetic Algorithm: GA clustering method which takes into account overlapped space among probability density functions of clusters in feature space is proposed. Through experiments with simulation da...

 Case-based Reasoning with Input Text Processing to Diagnose Mood [Affective] Disorders

 Case-Based Reasoning is one of the methods used in expert systems. Calculation of similarity degree among the cases has always been an important aspect in CBR as the system will attempt to identify cases with the h...

 Geography Markup Language: GML Based Representation of Time Serie of Assimilation Data and Its Application to Animation Content Creation and Representations

 Method for Geography Markup Language: GML based representation of time series of assimilation data and its application to animation content creation and representations is proposed. It is validated the proposed met...

  An Efficient Routing Protocol under Noisy Environment for Mobile Ad Hoc Networks using Fuzzy Logic

 A MANET is a collection of mobile nodes communicating and cooperating with each other to route a packet from the source to their destinations. A MANET is used to support dynamic routing strategies in absence of wir...

 Static Gesture Recognition Combining Graph and Appearance Features

 In this paper we propose the combination of graph-based characteristics and appearance-based descriptors such as detected edges for modeling static gestures. Initially we convolve the original image with a Gaussian...

Download PDF file
  • EP ID EP131476
  • DOI 10.14569/IJARAI.2014.030207
  • Views 99
  • Downloads 0

How To Cite

ShengJun Cheng, Jiafeng Liu, XiangLong Tang (2014).  Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models. International Journal of Advanced Research in Artificial Intelligence(IJARAI), 3(2), 32-38. https://europub.co.uk/articles/-A-131476