Effective Listings of Function Stop words for Twitter

Abstract

Many words in documents recur very frequently but are essentially meaningless as they are used to join words together in a sentence. It is commonly understood that stop words do not contribute to the context or content of textual documents. Due to their high frequency of occurrence, their presence in text mining presents an obstacle to the understanding of the content in the documents. To eliminate the bias effects, most text mining software or approaches make use of stop words list to identify and remove those words. However, the development of such top words list is difficult and inconsistent between textual sources. This problem is further aggravated by sources such as Twitter which are highly repetitive or similar in nature. In this paper, we will be examining the original work using term frequency, inverse document frequency and term adjacency for developing a stop words list for the Twitter data source. We propose a new technique using combinatorial values as an alternative measure to effectively list out stop words.

Authors and Affiliations

Murphy Choy

Keywords

Related Articles

Implication of Genetic Algorithm in Cryptography to Enhance Security

In today’s age of information technology secure transmission of information is a big challenge. Symmetric and asymmetric cryptosystems are not appropriate for high level of security. Modern hash function based systems ar...

Applications of Some Topological Near Open Sets to Knowledge Discovery

In this paper, we use some topological near open sets to introduce the rough set concepts such as near open lower and near open upper approximations. Also, we study the concept of near open, rough set and some of their b...

COMPARATIVE STUDY OF THE SOFTWARE METRICS FOR THE COMPLEXITY AND MAINTAINABILITY OF SOFTWARE DEVELOPMENT

Software metrics is one of the well-known topics of research in software engineering. Metrics are used to improve the quality and validity of software systems. Research in this area focus mainly on static metrics obtaine...

 Speaker Identification using Frequency Dsitribution in the Transform Domain

 In this paper, we propose Speaker Identification using the frequency distribution of various transforms like DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), Hartley...

Fitness Proportionate Random Vector Selection based DE Algorithm (FPRVDE)

Differential Evolution (DE) is a simple, powerful and easy to use global optimization algorithm. DE has been studied in detail by many researchers in the past years. In DE algorithm trial vector generation strategies hav...

Download PDF file
  • EP ID EP124949
  • DOI -
  • Views 59
  • Downloads 0

How To Cite

Murphy Choy (2012). Effective Listings of Function Stop words for Twitter. International Journal of Advanced Computer Science & Applications, 3(6), 8-11. https://europub.co.uk/articles/-A-124949