James Bowes

Purveyor of Pre-eminent Programmes

Archive for the ‘SCM’ Category

Converting SVN Commits to Git Patches

leave a comment »

In case you find yourself in need of a way to turn an svn revision into a git patch that can be applied with ‘git am’, keeping the commit message and authorship information, here’s a script I used recently:

# svnrev2git.py - Convert an SVN revsion to a Git patch.
# Author: James Bowes <jbowes@repl.ca>
# Usage:
#   $> cd my-svn-repo
#   $> python svnrev2git.py [AUTHORS_FILE] [REV_RANGE | REVSION [REVISION..]]
#   AUTHORS_FILE - a CSV of  svn username, full name, email
#   REV_RANGE - an svn revision range, like 100-700
#   REVISION - a single svn revision
#   You may specify either a revision range, or a series of individual
#   svn revisions
# Output:
#   A series of git style patch files, one per svn revision, which can then be
#   applied with 'git am'
# Why use this instead of 'git svn'?
#   I had done a large repo conversion via git svn where we wanted no downtime
#   for the switchover. After removing the git svn specific info from our git
#   commits, I used this tool to bring in commits from svn, keeping svn and git
#   in sync, until we were ready to switch.

import sys
import commands

def svnlog_to_gitlog(authors, svnlog):

    lines = svnlog.split("\n")
    lines = lines[1:-1]

    metainfo = lines[0].split(" | ")
    subject = lines[2]
    description = lines[3:]

    author = metainfo[1]

    day = metainfo[2].split("(")[1][:-1]
    time = metainfo[2].split(" ")[1]
    offset = metainfo[2].split(" ")[2]

    gitlog = []
    gitlog += ["From: %s <%s>" % authors[author]]
    gitlog += ["Date: %s %s %s" % (day, time, offset)]
    gitlog += ["Subject: [PATCH] %s" % subject]
    gitlog += [""]
    gitlog += description
    gitlog += [""]

    return '\n'.join(gitlog)

def svndiff_to_gitdiff(svndiff):
    lines = svndiff.split("\n")

    gitdiff = []
    for line in lines:
        if line.startswith("--- "):
            gitdiff.append("--- a/" + line[4:])
        elif line.startswith("+++ "):
            gitdiff.append("+++ b/" + line[4:])

    return '\n'.join(gitdiff)

def make_patch(authors, rev):
    out = commands.getoutput("svn log -c %s ." % rev)

    if len(out.split("\n")) < 2:
        print "skipping r%s" % rev

    patch = open(rev + ".patch", 'w')
    patch.write(svnlog_to_gitlog(authors, out))

    out = commands.getoutput("svn diff -c %s ." % rev)


    print "wrote %s.patch" % rev

def main(args):
    author_file = open(args[0])
    authors = {}

    print "loading authors"
    for line in author_file.readlines():
        parts = line.strip().split(", ")
        authors[parts[0]] = (parts[1], parts[2])


    revs = args[1:]

    if len(revs) == 1 and '-' in revs[0]:
        start, end = revs[0].split('-')
        start = int(start)
        end = int(end)
        revs = [str(x) for x in range(start, end + 1)]

    for rev in revs:
        make_patch(authors, rev)

if __name__ == "__main__":

Written by jbowes

June 23, 2009 at 6:29 am

Posted in SCM, tech

Graphing Git Repository Activity In ASCII

with 3 comments

Here’s a quick little script I wrote up (adapted from this perlmonks post) to show git repository activity as an ascii graph, like so:

git-graph screenshotThe X axis represents a day, with the current day being on the far right. The Y axis is no. of lines added + no. of lines deleted during that day.

EDIT (2009/02/03):

WordPress.com won’t let me attach a .pl file, so here’s the contents:

# git-graph.pl - Generate an ascii graph of git repository activity
# Copyright (C) 2008 James Bowes <jbowes@dangerouslyinc.com>
# Graphing routine Adapted from http://www.perlmonks.org/?node_id=336907

sub get_activity {
    my $day = shift;
    my $git_cmd = 'git diff --shortstat "@{' . ($day + 1) .' day ago}" "@{' .
                  ($day or "0") . ' day ago}"';
    $res = `$git_cmd 2> /dev/null`;

    $res =~ /, (.*?) insertions\(\+\), (.*?) deletions\(-\)/;
    $activity = $1 + $2;

    return $activity;

@deltas = ();
foreach $day (0..70) {
    push (@deltas, get_activity ($day));

print ("\n");
print graph(@deltas);
print ("\n");

sub graph {
  my( $i, $magic, $m, $p, $top, @g ) = ( 0, 20, 7, 70, 0, () );

  foreach $pad (0..($p - scalar(@_))) {
      push (@_, 0);

  @_ = reverse @_; 

  for (0..($p)) {
      $top = ($top > $_[$_]) ? $top : $_[$_];

  $top = $top - ($top % 100) + 100;

  my $s = $top > $magic ? ( $top / $magic ) : 1;  ### calculate scale

  for (0..$magic) {
    $g[$_] = sprintf("%" . ($m - 1) . "d |", $_ * $s) .
             ($_ % 5 == 0 ? '_' : ' ') x ($p);
    for $i (0..($p)) {
        substr($g[$_], ($i + $m), 1) = '#' if ($_[$i] / $s) > $_;
  join( "\n", reverse( @g ), ' Date:  ' . '^^^^^^|' x ( $p / 7 ));  
}  # end sub graph


Written by jbowes

May 24, 2008 at 11:15 am

Posted in SCM

Tagged with , , , ,

QotD: Mike on Version Control Best Practices

leave a comment »

< mdehaan> atomic commits are dangerous, just as atomic weapons

He may have been sarcastic. Maybe.

Written by jbowes

November 8, 2007 at 12:03 pm

Posted in SCM, tech

Tagged with

git bisect: A practical example with yum

leave a comment »

update: This post is now on my new blog.

I used git bisect to track down a bug in yum last night. It was so easy and practical that I figured I should record it here, so that others might want to give git a try.

I was attempting to install mutt, and yum failed (printing a traceback) after the rpms had been downloaded, but before the test transaction finished. So I started git bisect, and marked the current point as bad:

$> git bisect start
$> git bisect bad

The yum 3.1.0 release didn’t have this bug (it was the version I had installed at the time), so I marked it as good:

$> git bisect good yum-3-1-0
Bisecting: 15 revisions left to test after this
[1d0454af41ef6361604cafa8c7a13d80bc183c63] make it so that we see that the local rpm is present and then don't download

Git automatically checks out the next revision for you to test. This one happened to be good, so I marked it as such. I continued to test and mark revisions as either good or bad, until:

$> git bisect bad
832814e6b037621c4f26ee6a47e4b7b6dc7eb073 is first bad commit
commit 832814e6b037621c4f26ee6a47e4b7b6dc7eb073
Author: XXX
Date: XXX
:100644 100644 8ea07cda8441687da2f0e3dd794c3a1c50a0f161 567ef25557eacbd932bc5f8c20cd34e49c169f57 M cli.py
:100644 100644 50fb320c9c31a0f394985e244dc35b9766fb28ce 3875b70c4f8a7b6a9cf7d06de6df47e8a0ae5777 M yum-updatesd.py
:040000 040000 28296caad31015e1573b19dd84d12c2e3db2b90b 98048391465ca3da06c210d6f45c3f234dc12e0a M yum

At this point, with the traceback and the diff from the commit, it was easy enough figure out what the problem was, and commit a fix.

Written by jbowes

February 18, 2007 at 8:59 pm

Posted in SCM, tech

Tagged with , , ,

Did I mention that svn sucks?

with 4 comments

$> svn co svn://gcc.gnu.org/svn/gcc/trunk gcc
$> cd gcc
$> du -h
1.6G .
$> svn export . ../clean-gcc
$> cd ../clean-gcc
$> du -h
638M .

Oh, yes, I did.

Written by jbowes

February 11, 2007 at 12:46 pm

Posted in SCM, tech

Tagged with , ,

git rebase: keeping your branches current

with 7 comments

update: This post is now on my new blog.

Where possible, I use git for my scm now. All software on dangerously incompetent is stored in git, and I do my personal yum work with git-cvsimport. One of the reasons I like git so much is git-rebase. Here’s an example of how it works:

There is some upstream project that you wish to work on. You clone this upstream project when it is in state A, and make some changes. Your personal branch is now in state Ab, that is, A plus some set of changes b.

upstream ==========A

you                +=====Ab

Now, while you’ve been writing b, more changes have occurred upstream. These changes may or may not also be contained in b. Upstream is now in state A’

upstream ==========A==========A'

you                +=====Ab

Now, how do you get the differences between A and A’ into your branch? With many distributed scms, you would perform a merge. The merge will take the differences between A and A’ and apply them on top of Ab (this is a greatly simplified explaination, of course). Over time, you end up with a history in your branch that interleaves changes from upstream with your own changes. Merge is an option with git, but you can also perform a rebase.

With a rebase, the changes between A and Ab are taken and reapplied at A’:

upstream ==========A==========A'

you                           +=====A'b

So your own changes are always the most recent. In practice, I find this to be a very elegant approach. git-rebase makes it easy to see and manipulate your own set of changes against the upstream codebase.

Written by jbowes

January 26, 2007 at 7:29 pm

Posted in SCM, tech