Stormy Glide

"Think different." — Apple Inc.

What time of a day you commit code matters

Number of commits and buggy commits over a day (Linux Kernel)

Percentage of buggy commits (bars) and total number of commits (circles) versus time-of-day in Linux Kernel

I recently come across an interesting paper talking about social characteristics of buggy commits – “Do Time of Day and Developer Experience Affect Commit Bugginess?” It checks buggy commits from two open source projects, Linux and PostgreSQL, and digs out some interesting findings.

The first thing it finds is that when developers commit code matters. As the above picture shows, commits made around mid-night have higher buggy rate, especially changes made right after mid-night. Maybe people just rush to commit something before going to bed. The number of commits drops rapidly during night, as expected. Interestingly, people makes less bugs in early morning. Commits made around 7 to 9 o’clock in the morning have obviously the lowest bug rate. So this tells us when you want commit a change late night, you may want to think twice. Good motivation for developers to become a morning person, which most of us are not :-)

Here is a couple more findings from the authors:

  1. Developers make less bugs as they get more experiences, as expected, except for PostgreSQL for which the paper finds the “most experienced” authors make quite some mistakes.
  2. Ratio of buggy commits keeps constant across days of a week. But for PostgreSQL, people make a little less bugs on Tuesdays and Wednesdays; and a little more bugs on Sundays and Mondays.
  3. “The Linux kernel developers who commit changes daily, but not as their day job, produce the largest number of commits and the smallest number of bug-introducing commits, followed by the single-commit authors”.
  4. More than 1200 commits to Linux Kernel are solely for fixing comments.
  5. “Inexperienced developers tend to do more commits between midnight and 2 AM than experienced developers, who do more commits between 8 AM and 4 PM”.

The biggest challenge of doing this type of analysis is that source code commits are ad hoc. It is hard to decide if a commit is a bug fix and which bug (or bugs) it tries to fix, which makes it even hard to find which commits introduce bugs. This paper has to reply on some heuristics. One may think of letting developers to always fix one bug at one commit and specify which bug this commit aims to fix. But based on my experience, non-automatic rules would never work for developers. So maybe it is a good idea to integrate some features to SCM tools to make commits easier to understand and analyze.

I find this kind of research interesting and useful. There are many tools that analyze source code, but less analyze developers’ behaviors and social activities, which also have big impact on source code quality. If you’re also interested, check out this conference, Mining Software Repositories. Developers’ activities vary across different development communities, so hopefully the authors will release the tools for everybody to analyze their own repositories.

100,000 Toothpicks + 35 Years = Rolling Through The Bay

An amazing architecture built by Scott Weaver with over 100,000 toothpicks over the course of 35 years. It shows you what it feels like rolling through the Bay Area.

Chinese is One of The Most “Difficult” Languages on The Planet

It’s Greek to me” is an expression in English to claim that something is very difficult to understand. Actually many languages have similar expressions. A lot of languages directly or indirectly refer Chinese as a hard language to understand; in Chinese, the similar idiom is “it sounds like God’s language”. So it seems like Chinese is one of the most difficult languages on the planet. :)

Chinese is one of the most difficult languages on the planet