Mining Repositories to Model Software Evolution

  • Mining Repositories to Model Software Evolution

School of Electronics, Electrical Engineering and Computer Science
& ECIT Global Research Institute

Proposed Project Title: Mining Repositories to Model Software Evolution

Principal Supervisor:   Des Greer              Second Supervisor: Cassio P de Campos

Project Description:

Much of what we know about software evolution is based on closed projects, built in established co-located teams and for a specific customer. However, software nowadays is generally developed in teams, by developers who are often distributed, building code over a long period of time. Thus, Open Source Software (OSS) and its development provides different challenges but also opportunities in that the data from these projects is often freely available since they are the process is usually controlled via a central repository along with an array of development tools that can be used to capture data on the project.

One prominent data source is the GitHub repository which is now the most comprehensive source of OSS projects. Being open with a comprehensive API, GitHub is a rich source of data which could be used to study how software projects evolve. GitHub provides an API for extraction of data about projects, its contributors and users. In addition, there are analysis tools that will automatically analyse source code for quality. 

This research project will look at what data on commits, on bugs, on contributor activity etc. can be extracted and used for such projects as Git, Zotero, Voldemort, JQuery and others and then investigate how this changes over time for a software project.

One possible research question that could be answered is: “Is it possible to present, visualise and make use of repository data for open source software to predict the quality of software?”  Another possible Research question is “How can we build a model the evolution of open source software using git repository data?”

This implies the ability to map the evolution of the software over time for any project. The map can include times, contributors, branches, merges, lines of code, issues etc. Following that, we can ask several research questions about how products evolve, which ones survive and which ones die, along with the factors contributing to the development.

Indeed, there is a range of project attributes and metrics that could be studied for correlation and the possibility exists for new practical contributions on how software evolves, how to predict its evolution and how to adjust software development practices to increase the probability of success. These are just some of the possibilities for research and the work could be extended to privately owned repositories.

Contact details

Supervisor Name: Dr Des Greer                                                Tel: +44 (0)28 9097 4656
QUB Address:       School of EEECS, CS Building                        Email: