Blog - HubSpot Product Team

The Self-Referential, Executable svn Commit Message

Written by Dan Milstein | Aug 9, 2010

A month or so back, the HubSpot svn repo began to approach commit number 50,000.  And, amid the idle banter about how best to celebrate the event, my friend and scrum team-mate Owen J Raccuglia, aka OJRac, aka Juice, quietly went and coded up a deviously clever hack, which, a brief bout of svn commit sniping later, became our commit 50K.

Let's take a look (actual svn server name changed to protect the innocent):

$ svn log -r 50000 https://svn.someserver.com/svn
------------------------------------------------------------------------
r50000 | oraccuglia | 2010-07-14 00:10:41 -0400 (Wed, 14 Jul 2010) | 6 lines

#/bin/bash
    svn export https://svn.someserver.com/svn/50k -r 50000

    gcc -xc 50k
    ./a.out | diff 50k -
    ./a.out | python -
------------------------------------------------------------------------
So, what happens if, say, you were to run that commit message as a shell script?  Let's see:
$ svn log --incremental -r 50000 https://svn.someserver.com/svn | tail -n +4 | /bin/bash
A    50k
Export complete.
#/bin/bash
svn export https://svn.someserver.com/svn/50k -r 50000

gcc -xc 50k
./a.out | diff 50k -
./a.out | python -
Okay, so, modulo a bit of output noise (which is a slight, irremediable flaw in the hack -- the svn export line should have had a -q -- but we can't change that message now, is part of the overall delight). But anyways, except for that, running the commit message... produces the commit message. But how does it do so? Let's unpack it.
$ svn export https://svn.someserver.com/svn/50k -r 50000
That's checking out a file named 50k, which, if we look at the source, is:
#include/*
m='''#/bin/bash
svn export https://svn.someserver.com/svn/50k -r 50000

gcc -xc 50k
./a.out | diff 50k -
./a.out | python -
'''
s='''*/
main(){char*_;/*'''
def printf(*a):print m
#*/
_="#include/*%cm='''#/bin/bash%csvn export https://svn.someserver.com/svn/50k -r 50000%c%cgcc -xc 50k%c./a.out | diff 50k -%c./a.out | python -%c'''%cs='''*/%cmain(){char*_;/*'''%cdef printf(*a):print m%c#*/%c_=%c%s%c;printf(_,10,10,10,10,10,10,10,10,10,10,10,10,34,_,34,10,10,10,10);%c#/*%cs='''*/%c}//'''%c";printf(_,10,10,10,10,10,10,10,10,10,10,10,10,34,_,34,10,10,10,10);
#/*
s='''*/
}//'''
Okay, um, it looks like a C program, but also, if you squint, like a python program.  Alright, what does our friend the commit message do with this code?
$ gcc -xc 50k
It's compiling it as a C program, and, naturally, placing the resulting executable in good old a.out.  Then, gratuitously, just to show off, the commit message does:
$ ./a.out | diff 50k -
Meaning, it runs a.out, and compares the results with the original source code used to create it.  And... they're identical.  So, in the middle of this process, there's a C quine. Next up, the commit message runs that same program as a python script:
$ ./a.out | python -
(Of course, this step could have been "python 50k" , but that would take the fun out of the C-quine nature of the code).

And, when run as a python program, it produces:
#/bin/bash
svn export https://svn.someserver.com/svn/50k -r 50000

gcc -xc 50k
./a.out | diff 50k -
./a.out | python -

... the orginal text of the commit message.  So, the file 50k is valid code in two languages, in one of which it's a quine, in the other it produces the original commit message.

Overall, I see the commit message as a three language-long quine loop, depending on some external state in the svn tree (but that state is pretty long-term reliable).  In fact, when I consider it that way, it feels a bit like the genotype -> phenotype loop of self-reference and self-reproduction (which often depends on external state in the real world).

All in all, it delights me no end.

Anyone have any other tales of svn commit message hackery to share? Owen and I have been kicking around some steganography-inspired ideas for hiding code in the low bits of an image, or in diffs of specific commits, but, for now, those will just have to wait for commit number 60,000.