Applying Machine Learning to Software Fault Prediction

Journal Title: e-Informatica Software Engineering Journal - Year 2018, Vol 12, Issue 1

Abstract

Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets.

Authors and Affiliations

Bartłomiej Wójcicki, Robert Dąbrowski

Keywords

Related Articles

Software Change Prediction: A Systematic Review and Future Guidelines

Background: The importance of Software Change Prediction (SCP) has been emphasized by several studies. Numerous prediction models in literature claim to effectively predict change-prone classes in software products. Thes...

NRFixer: Sentiment Based Model for Predicting the Fixability of Non-Reproducible Bugs

Software maintenance is an essential step in software development life cycle. Nowadays, software companies spend approximately 45% of total cost in maintenance activities. Large software projects maintain bug repositorie...

The Role of Organisational Phenomena in Software Cost Estimation: A Case Study of Supporting and Hindering Factors

Despite the fact that many researchers and practitioners agree that organisational issues are equally important as technical issues from the software cost estimation (SCE) success point of view, most of the research focu...

Software Startups -- A Research Agenda

Software startup companies develop innovative, software-intensive products within limited time frames and with few resources, searching for sustainable and scalable business models. Software startups are quite distinct f...

On Visual Assessment of Software Quality

Development and maintenance of understandable and modifiable software is very challenging. Good system design and implementation requires strict discipline. The architecture of a project can sometimes be exceptionally di...

Download PDF file
  • EP ID EP382522
  • DOI 10.5277/e-Inf180108
  • Views 61
  • Downloads 0

How To Cite

Bartłomiej Wójcicki, Robert Dąbrowski (2018). Applying Machine Learning to Software Fault Prediction. e-Informatica Software Engineering Journal, 12(1), 199-216. https://europub.co.uk/articles/-A-382522