A Comparative Study of Machine Learning Approaches- SVM and LS-SVM using a Web Search Engine Based Application
Journal Title: International Journal on Computer Science and Engineering - Year 2012, Vol 4, Issue 5
Abstract
Semantic similarity refers to the concept by which a set of documents or words within the documents are assigned a weight based on their meaning. The accurate measurement of such similarity plays important roles in Natural language Processing and Information Retrieval tasks such as Query Expansion and Word Sense Disambiguation. Page counts and snippets retrieved by the search engines help to measure the semantic similarity between two words. Different similarity scores are calculated for the queried conjunctive word. Lexical pattern extraction algorithm identifies the patterns from the snippets. Two machine learning approaches- Support Vector Machine and Latent Structural Support Vector Machine are used for measuring semantic similarity between two words by combining the similarity scores from page counts and cluster of patterns retrieved from the snippets. A comparative study is made between the similarity results from both the machines. SVM classifies between synonymous and non-synonymous words using maximum marginal hyper plane. LS-SVM shows a much more accurate result by considering the latent values in the dataset.
Authors and Affiliations
S. S. Arya , S. Lavanya
Relational Peer Data Sharing Settings and Consistent Query Answers
In this paper, we study the problem of consistent query answering in peer data sharing systems. In a peer data sharing system, databases in peers are designed and administered autonomously and acquaintances between peers...
Weblog Search Engine Based on Quality Criteria
Nowadays, increasing amount of human knowledge is placed in computerized repositories such as the World Wide Web. This gives rise to the problem of how to locate specific pieces of information in these often quite unstru...
Simultaneous Pattern and Data Clustering Using Modified K-Means Algorithm
In data mining and knowledge discovery, for finding the ignificant correlation among events Pattern discovery (PD) is used. PD typically produces an overwhelming number of patterns. Since there are too many patterns, it...
An Effective Round Robin Algorithm using Min-Max Dispersion Measure
Round Robin (RR) scheduling algorithm is a preemptive scheduling algorithm. It is designed especially for time sharing Operating System (OS). In RR scheduling algorithm the CPU switches between the processes when the sta...
DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY
Content-addressable memory (CAM) is frequently used in applications, such as lookup tables, databases, associative computing, and networking, that require high-speed searches due to its ability to improve application per...