Git Commit Signing

The development of Git was started by Linus Torvalds in early 2005. Later that year git had it’s first releases with the help of several other developers. The release notes for git version 1.5.3 released on the 14th of febuary 2007 already mentioned the git tag signing functionality ("git tag" and "git verify-tag"). In 2012 during efforts to make the git frontend more user frieldy the git commit signing we know today was created. The concept of a developer signing a commit to approve its content remained the same though. Today commit signing is used for example by archlinux to ensure the integrity and origin of package installation and build instructions (PKGBUILDs).

Approach

To find out more about the spread of commit signing I analyzed several million commits regarding their signatures. To find the repos I downloaded metadata collections of git repositories (Ultimate Debian Database and Libraries.io). That resulted in metadata for over 30 million repositories. Because that collection was too big, contained duplicates and plainly unreasonable entries, i did my best to filter them out. I ended up with approximately 30k repositories which after downloading took up around 1.4TB of disk space. The following table shows for each hoster-source combination how many repos there were listed and how many of them i selected to be relevant enough to be downloaded and analyzed.

Hoster \ SourceUDDLibraries.ioGHTorrent
GitHub.com6,769 / 1643323,842 / 1.5-30.2M90GB
GitLab.com253 / 2730 / 1k-214k0
Sourceforge434 / 61500 / 00
Attributespopconlio metric, gh-starsgh-stars
Quality/Integrity of DBmediumbadgood
Usedyesyesno

Results

The absolute number of commits over time shows the growth of github and the decent of sourceforge:

To put the commit signing percentages into perspective, here some key events: The graph begins with 2012 when commit signing was introduced. In April 2016 GitHub marked singed commits in their webinterface as "verified" with a friendly green label. In August 2017 GitLab introduced the same thing as well:

Using the metrics from the Ultimate Debian Databases popularity contest, I plotted signing percentages over popularity of the software living inside the corresponding git repository. You can see, that commit signing is done across the board in projects of all sizes. Remarkably you can see that while big project have a trend to a higher signing percentage just like smaller projects, lag behind by a bit. Maybe that is because of slower development cycles?