LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN

Abstract

Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming-languagedependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies.

Authors and Affiliations

Oscar Karnalim

Keywords

Related Articles

IDENTIFICATION AND QUANTIFICATION OF FACTORS AFFECTING REUSABILITY OF OPEN SOURCE SOFTWARE IN REUSE-INTENSIVE SOFTWARE DEVELOPMENT

Open Source Software (OSS) is one of the emerging areas in software engineering. Reuse of OSS is employed in reuse-intensive software development such as Component Based Software Development and Software Product Lines. O...

ANALYSIS OF PARAMETERIZATION VALUE REDUCTION OF SOFT SETS AND ITS ALGORITHM

In this paper, the parameterization value reduction of soft sets and its algorithm in decision making are studied and described. It is based on parameterization reduction of soft sets. The purpose of this study is to inv...

A GLOBAL AFRICAN BUFFALO OPTIMIZATION

In this paper, a modified version of the African Buffalo Optimization algorithm with emphasis on global search is proposed. Two different equations with the values of their upper and lower boundaries are selected to be t...

PARAMETER-LESS SIMULATED KALMAN FILTER

Simulated Kalman Filter (SKF) algorithm is a new population-based metaheuristic optimization algorithm. In the original SKF algorithm, three parameter values are assigned during initialization, the initial error covarian...

KNOWLEDGE MAPPING PROCESS MODEL FOR RISK MITIGATION IN SOFTWARE MANAGEMENT

As software organizations try to mitigate operational and technical risk that occurs when using software, there is need to develop a knowledge intensive system to assist team members in mitigating both operational and te...

Download PDF file
  • EP ID EP597364
  • DOI -
  • Views 108
  • Downloads 0

How To Cite

Oscar Karnalim (2018). LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN. International Journal of Software Engineering and Computer Systems, 4(1), 29-47. https://europub.co.uk/articles/-A-597364