Applying Machine Learning to Software Fault Prediction
Journal Title: e-Informatica Software Engineering Journal - Year 2018, Vol 12, Issue 1
Abstract
Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets.
Authors and Affiliations
Bartłomiej Wójcicki, Robert Dąbrowski
A Systematic Mapping Study on Software Measurement Programs in SMEs
Context: Software measurement programs are essential to understand, evaluate, improve and predict the software processes, products and resources. However, successful implementation of software measurement programs (MPs)...
Model Driven Web Engineering: A Systematic Mapping Study
Background: Model Driven Web Engineering (MDWE) is the application of the model driven paradigm to the domain of Web software development, where it is particularly helpful because of the continuous evolution of Web techn...
Machine Learning or Information Retrieval Techniques for Bug Triaging: Which is Better?
Bugs are the inevitable part of a software system. Nowadays, large software development projects even release beta versions of their products to gather bug reports from users. The collected bug reports are then worked up...
Experience Report: Towards Extending an OSEK-Compliant RTOS with Mixed Criticality Support
Background: With an increase of the number of features in a vehicle, the computational requirements also increase, and vehicles may contain up to 100 Electronic Control Units (ECUs) to accommodate these requirements. For...
An Empirical Study on the Factors Affecting Software Development Productivity
Background : Software development productivity is widely investigated in the Software Engineering literature. However, continuously updated evidence on productivity is constantly needed, due to the rapid evolution of sof...