LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN

Abstract

Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming-languagedependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies.

Authors and Affiliations

Oscar Karnalim

Keywords

Related Articles

INFORMATION SYSTEMS REENGINEERING APPROACH BASED ON THE MODEL OF INFORMATION SYSTEMS DOMAINS

The paper considers current problems of integration of Information Systems (IS), limitations of current methods of IS Reengineering and limitations of existing approaches for Data Integration in Relational Databases. The...

INDONESIAN TEXT-TO-SPEECH SYSTEM USING DIPHONE CONCATENATIVE SYNTHESIS

In this paper, we describe the design and develop a database of Indonesian diphone synthesis using speech segment of recorded voice to be converted from text to speech and save it as audio file like WAV or MP3. In design...

AN EVALUATION OF IMPROVED CLUSTER-BASED ROUTING PROTOCOL IN AD-HOC WIRELESS NETWORK

In this paper we presents a performance comparison of Dynamic Source Routing (DSR), Ad hoc On Demand Vector (AODV), Cluster Based Routing Protocol (CBRP) and Improved Cluster Based Routing Protocol (i-CBRP) routing proto...

THE NEED OF DASHBOARD IN SOCIAL RESEARCH NETWORK SITES FOR RESEARCHERS

Nowadays, dashboard has been widely used by organizations to display information based on their objectives such as monitoring business performance or checking the current trend in the niche market. There is a need to inv...

IDENTIFICATION AND QUANTIFICATION OF FACTORS AFFECTING REUSABILITY OF OPEN SOURCE SOFTWARE IN REUSE-INTENSIVE SOFTWARE DEVELOPMENT

Open Source Software (OSS) is one of the emerging areas in software engineering. Reuse of OSS is employed in reuse-intensive software development such as Component Based Software Development and Software Product Lines. O...

Download PDF file
  • EP ID EP597364
  • DOI -
  • Views 111
  • Downloads 0

How To Cite

Oscar Karnalim (2018). LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN. International Journal of Software Engineering and Computer Systems, 4(1), 29-47. https://europub.co.uk/articles/-A-597364