Context: Software comprehension and maintenance activities, such as refactoring, are said to be negatively impacted by software complexity. The methods used to measure software product and processes complexity have been thoroughly debated in the literature. However, the discernment about the possible links between these two dimensions, particularly on the benefits of using the process perspective, has a long journey ahead.
Objective: To improve the understanding of the liaison of developers’ activities and software complexity within a refactoring task, namely by evaluating if process metrics gathered from the IDE, using process mining methods and tools, are suitable to accurately classify different refactoring practices and the resulting software complexity.
Method: We mined source code metrics from a software product after a quality improvement task was given in parallel to (117) software developers, organized in (71) teams. Simultaneously, we collected events from their IDE work sessions (320) and used process mining to model their processes and extract the correspondent metrics.
Results: Most teams using a plugin for refactoring (JDeodorant) reduced software complexity more effectively and with simpler processes than the ones that performed refactoring using only Eclipse native features. We were able to find moderate correlations (43%) between software cyclomatic complexity and process cyclomatic complexity. Using only process-driven metrics, we computed 30,000 models aiming to predict the type of refactoring method (automatic or manual) teams had used and the expected level of software cyclomatic complexity reduction after their work sessions. The best models found for the refactoring method and cyclomatic complexity level predictions had an accuracy of 92.95% and 94.36%, respectively.
Conclusions: We have demonstrated the feasibility of an approach that allows building cross-cutting analytical models in software projects, such as the one we used for detecting manual or automatic refactoring practices. Events from the development tools and support activities can be collected, transformed, aggregated, and analyzed with fewer privacy concerns or technical constraints than source code-driven metrics. This makes our approach agnostic to programming languages, geographic location, or development practices, making it suitable for challenging contexts, such as, in modern global software development where many projects adopt agile methodologies and low/no-code platforms. Initial findings are encouraging, and lead us to suggest practitioners may use our method in other development tasks, such as defect analysis and unit or integration tests.