Studying the Impact of Software Architecture on the Quality of Evolving Software Systems

This research has investigated the impact of architecture on bugs: do co-changes spanning multiple architecture modules more likely to introduce bugs, compared to co-changes that are within modules?

Overview

The method designed and implemented to run our empirical study has several different components. The first component is Co-changes Extractor, which searches source code repositories and retrieves the groups of files which have been changed together. The second component is Defect Extractor, which parses the commit logs of projects and identifies the software changes which introduced the defects/bugs in the system. The third component is Architecture Explorer component which utilizes different reverse engineering approaches and obtains Surrogate Views that approximate the system's architecture.

[Overview picture]

Results

The results show that the co-changes that cross architectural module boundaries are more correlated with defects than co-changes within modules, implying that, to improve accuracy, bug predictors should also take the software architecture of the system into consideration.

Tools

Class Dependency Analyzer
Bunch
ACDC
ArchDRH

Surrogate Views

We used six open source projects for our empirical study: Hive, OpenJPA, HBase, PDFBox, Camel and Solr. For each of the projects you can find multiple sub folders for different versions that we used in the study. For each version you can find the following files:

  1. ".odem" file which is generated by class dependency analyzer tool.
  2. ".txt" file which contains the dependency information and is generated based on the ".odem" file and is the input to the Bunch tool
  3. ".bunch" file which is generated by Bunch and includes the clusters in the system
  4. Two ".rsf" files which are the input and output for ACDC
  5. ArchDRH view
  6. ".prn" which shows the LDA view
Hive Surrogate Views
HBase Surrogate Views
OpenJPA Surrogate Views
Cassandra Surrogate Views
Camel Surrogate Views

Results

We used R for statistical analysis. Each row int the input file, includes the file name in the project and the corresponding metrics for that file (Intra-module co-change, Cross-module co-change, Number-of-co-changed-files, number-of-defects and LOC). The results of regression analysis and Spearman correlation is located in each folder as well.

Package view – results
Bunch view – results
ACDC view – results
LDA view – results
ArchDRH view – results

Publications

More details about ARMOUR can be found in our publication:

[seal's logo]
[uci's logo]