Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models

Abstract

 This paper shows how to use labeled and unlabeled data to improve inductive models with the help of transductivemodels.We proposed a solution for the self-training scenario. Self- training is an effective semi-supervised wrapper method which can generalize any type of supervised inductive model to the semi-supervised settings. it iteratively refines a inductive model by bootstrap from unlabeled data. Standard self-training uses the classifier model(trained on labeled examples) to label and select candidates from the unlabeled training set, which may be problematic since the initial classifier may not be able to provide highly confident predictions as labeled training data is always rare. As a result, it could always suffer from introducing too much wrongly labeled candidates to the labeled training set, which may severely degrades performance. To tackle this problem, we propose a novel self-training style algorithm which incorporate a graph-based transductive model in the self-labeling process. Unlike standard self-training, our algorithm utilizes labeled and unlabeled data as a whole to label and select unlabeled examples for training set augmentation. A robust transductive model based on graph markov random walk is proposed, which exploits manifold assumption to output reliable predictions on unlabeled data using noisy labeled examples. The proposed algorithm can greatly minimize the risk of performance degradation due to accumulated noise in the training set. Experiments show that the proposed algorithm can effectively utilize unlabeled data to improve classification performance.

Authors and Affiliations

ShengJun Cheng, Jiafeng Liu, XiangLong Tang

Keywords

Related Articles

 Design and Implementation of Rough Set Algorithms on FPGA: A Survey

 Rough set theory, developed by Z. Pawlak, is a powerful soft computing tool for extracting meaningful patterns from vague, imprecise, inconsistent and large chunk of data. It classifies the given knowledge base app...

PREDICTION OF ASSETS BEHAVIOR IN FINANCIAL SERIES USING MACHINE LEARNING ALGORITHMS

The prediction of financial assets using either classification or regression models, is a challenge that has been growing in the recent years, despite the large number of publications of forecasting models for this task....

 Semantic Image Retrieval: An Ontology Based Approach

 Images / Videos are major source of content on the internet and the content is increasing rapidly due to the advancement in this area. Image analysis and retrieval is one of the active research field and researcher...

Brain Computer Interface Boulevard of Smarter Thoughts

The Brain Computer Interface is a major breakthrough for the technical industry, medical world, military and the society on a whole. It is concerned with the control of devices around us such as computing gears & eve...

 Flowcharting the Meaning of Logic Formulas

 In logic, representation of a domain (e.g., physical reality) comprises the things its expressions (formulas) refer to and their relationships. Recent research has examined the realm of nonsymbolic representations,...

Download PDF file
  • EP ID EP131476
  • DOI 10.14569/IJARAI.2014.030207
  • Views 81
  • Downloads 0

How To Cite

ShengJun Cheng, Jiafeng Liu, XiangLong Tang (2014).  Using Unlabeled Data to Improve Inductive Models by Incorporating Transductive Models. International Journal of Advanced Research in Artificial Intelligence(IJARAI), 3(2), 32-38. https://europub.co.uk/articles/-A-131476