News document analysis by using a proficient algorithm
Journal Title: International Journal of engineering Research and Applications - Year 2017, Vol 7, Issue 6
Abstract
News articles analyzing is one of the emerging research topic in the past few years. News paper discusses various types (political, education, employment, sports, agriculture, crime, medicine, business, etc) of news in different levels such as International, National, state and district level. In this news articles, crime discussion plays a major role because one crime leads to a many other crimes and also affect many other lives. In India, Madurai is one of the important places which have many historical monuments. Madurai is a sensitive place. This paper analyzes the crimes which occur in the year 2015 in and around Madurai. This analysis helps to police department to reduce the occurrence of crime in the future. This proposed system used Support Vector Machine (SVM) for effectively classify the document. News documents are preprocessed using pruning and stemming. From the stemmed words, the informative words are selected and weighted using feature selection methods such as Term-Frequency and Inverse Document Frequency (TF-IDF) and Chi-square. It returns the high dimensional vector space. It is reduced to low dimension using Latent Semantic Analysis (LSA) method. Compute the cosine similarity between the key document and news documents. Based on the value, the news documents are labeled as crime and non-crime. Some of the documents are used to train the SVM classifier. Some of the documents are used to test the performance of developed system. From the comparative study, it is identified that the performance of the proposed approach improves the classification accuracy.
Authors and Affiliations
K. Meena, R. Lawrance
Social Media and self-medication for weight loss: insights from Facebook in Brazil
Recognized as a public health issue, self-medication is employed for pain ceasing and disease prevention, or even for aesthetic purposes, although it can compromise a person’s health. Part of the information that leads t...
A Bivariate Exponential distribution model for growth hormone response to repeated maximal cycle ergometer exercise at different pedaling rates.
In this paper, we introduce the bivariate exponential distribution model approach to probability modeling. Our results leads to exponential model characterization of many well known life time model to the development of...
Discriminate Vertical Handoff Time Using GRA and TOPSIS Algorithms in Heterogeneous Networks by Effect of Metrics
Coherent persistence is the main objective and has been a vital challenge in fourth generation wireless networks (FGWNs). “HANDOVER” is the best possible solution to achieve the coherent connectivity, which will be used...
Delay Study in the Construction Project: An Overview
“Scheduling” is the word when it came in our mind it means the time table and most important thing is that, works should be done in time, but even if we try everything to keep our time table in schedule some of the facto...
Compressive and Split Tensile Strength Characteristics of Silica Fume Modified Fiber Reinforced Concrete
The Cement mainly consumes approximately 10 -15 % of total industrial energy. This energy releases carbon dioxide co2 emission to atmosphere as a result of burning fuels to produce energy needed for cement manufacturing...