Another JBoss GitHub repository mirror

30 August 2008

git github java jboss jbossorg mirror svn

For those of you playing along at home, I've added jboss-deployers to my GitHub mirror set. Like the others, the 'vendor' branch is the one you want.

I'm adding JBoss projects to my mirror set as I trip across the need to browse their source. If there's a JBoss project you'd like to see mirrored out of SVN, drop the URL to the trunk of the SVN repository in a comment on this post, and I'll start slurping it.

Be a smarter patch monkey

25 August 2008

git metaprogramming ruby

A project I'm working on requires some hard-core monkey-patching of Rails internals.

Monkey-patching is a dangerous occupation, and liable to cause new and intriguing bugs into previously-tested sane code.

I've been working on a smarter patch-monkey, known as Lemur.

The goal is to allow monkey-patched methods (currently only instance methods are supported) to be written in modules that are mixed in (as modules are) but allowing redefinition of methods in the patchee by the patcher module.

I may be ignorant of some Ruby to make it happen, but I've resorted to alias_method and remove_method, along with a handful of Ruby's reflection methods to swap methods in a reasonable, clean, and auditable fashion.

The specs demonstrate how it works. Assume a basic class:

  
class BasicClass
  def some_instance_method()
    # ...
  end
end

And a module to monkey-patch it

  
module PatchModule
  def some_instance_method()
    # ...
  end
end

Normally, Ruby will prefer a locally-defined method over a module mix-in, so you can't just include your patch module in, even using class_eval.

So, invite in the Lemur.

  
Lemur.patch_class(BasicClass, PatchModule)

And voila! Your class is monkey-patched by the nicely self-contained module, plus, it's tracked.

  
Lemur.patched_classes # [ BasicClass ]

And even more cool, you can get some patch-audit information for each patched class:

  
Lemur.patch_records( BasicClass ) # [ array of PatchRecords ]

Each PatchRecord keeps up with the patched class, the patched method name, the actual replaced Method object, along with the patch module and the patch method.

A total of 40min has been spent writing the code so far. The idea is to add better auditability, unpatching, and dealing with class methods, not just instance methods.

Now, when you encounter a weird bug, you can ask the Lemur where the oddness might've originated.

Want to pitch in and do some meta-programming to make future meta-programming less scary, fork my git repository and send me some pull requests.

GitHub Mirrors for some JBoss Projects

21 August 2008

git github java jboss jruby svn

In addition to the previously-mentioned JRuby mirror from Codehaus SVN to GitHub, I'm now also mirroring:

All are trunk-only mirrors, not picking up branches or tags. Since the JBoss repository path has about 77,000 subversion revisions, and at one point held any and all JBoss software ever written, I have not mirrored it in its entirety. Instead, I've only grabbed http://anonsvn.jboss.org/repos/jbossas/trunk back to revision 77,200. It'll mirror going forward, but the github repository does not include any ancient history.

For those of you playing along at home, the way to fetch just a cauterized "tip" from SVN to a git repository is to mirror as before, but for the initial "git svn fetch" command, add a SVN-style revision range

git svn fetch -r77200:HEAD

For me, at least, trying to fetch the tip revision for the directory resulted in failure. Going back a few revisions, and using a range that includes HEAD worked much better. Then just push to GitHub has normal, and start your rebase/push cronjob.

The JBoss projects are updated from SVN every 15 minutes. But we're updating from the anonymous SVN repository at JBoss, which itself is delayed from the developer repository by some amount of time. So, ultimately, the GitHub mirror should be mostly up-to-date, but could lag behind actual developer commits by up to and hour, I reckon.

If you're wanting to track these repositories using my git mirror, only track the vendor branch. I make no claims about the stability or sanity of the 'master' ref at any point in time. I will make sure 'vendor' exactly matches the Subversion history, though.

Mirroring SVN repository to GitHub

20 August 2008

git java jruby subversion

So, I'm gearing up to work on some Java+Ruby (via JRuby) stuff. The Java world still seems fairly entrenched in the cult of Subversion, while the Rubyists have gone with Git lately.

I'm still wrapping my mind around Git, but with GitHub, it's fairly easy and straight-forward. I paid my $7 for the micro account, to give me room to screw around.

There's quite a few posts about mirroring SVN to a Git repository, but I feel the need to add my own, of course.

My goal is mirror the trunk of the JRuby project from Codehaus SVN to my account on GitHub. By doing this, I can track the trunk development, and also work on my own patches.

I started by creating an empty repository on my GitHub account, called 'jruby'.

http://github.com/bobmcwhirter/jruby/tree/master

Now, over on my always-on, Contegix-powered server, I create a brand new local git repository, also called jruby.

mkdir jruby cd jruby git init

Next I use 'git svn init' to setup the SVN repository as a remote code source to track. Using the -T switch points git to the trunk, and ignores branches and tags, which is fine for my purposes.

git svn init -T http://svn.codehaus.org/jruby/trunk/jruby/

That does not pull any code, but it lets my local working tree know that I'm going to be pulling from an SVN repository at some point. This setup only occurs in your local repository, and does not seem to ever get pushed to GitHub once we get to that point.

So, now we do the initial pull. Once again, this is on my always-on, Contegix-powered server, not my local laptop. I'm doing this on a server because towards the end, we'll be setting up a cronjob to accomplish it all.

git svn fetch

It'll think for a while, it'll slurp down the SVN revision history, it'll stop and ponder occasionally, and eventually, it'll be done. Woo-hoo! Our local working tree is now up-to-date with the subversion HEAD as of that moment.

To reduce disk-space used by your local repository, go ahead and run the garbage collector

git gc

On my system, that reduced the space from over 600mb to under 70mb.

Now, that's great, but it's still just on my local repository. Time to push it to GitHub. We're not going to follow their directions exactly, since this will ultimately be a cronjob and needs to use ssh. And I'm slightly paranoid about my ssh keys.

So, the first thing I do is create another keypair, for used only by my mirroring process, and only for pushing changes to github. It has no passphrase. This allows me to keep my top-secret keys off my shared, always-on server. If these keys are compromised, all an attacker can use them for is to push changes to GitHub. Which, being revision-control, is more annoying than dangerous. (Hooray for "git reset").

ssh-keygen -t dsa -f .ssh/id_dsa_github_mirroring

Next, I edit my .ssh/config to add a "fake host" so that ssh connections invoked by git will use this new key.

As with all previous bits, this is still on my always-on server, not my local laptop.

Host githubmirror User git Hostname github.com IdentityFile /home/bob/.ssh/id_dsa_github_mirroring

This will cause any invocation of "ssh githubmirror" into "ssh git@github.com -i .ssh/id_dsa_github_mirroring".

I then installed id_dsa_github_mirroring.pub into my GitHub account.

Now, GitHub's instructions say to run this command to add the GitHub repository as a remote named "origin"

git remote add origin git@github.com:bobmcwhirter/jruby.git

Instead, we teak it to use the "fake host" we added to .ssh/config

git remote add origin git@githubmirror:bobmcwhirter/jruby.git

We're almost done, I promise.

Next, we need to do the first push from my server up to GitHub. We first push to the 'master' branch, since the repo really wants to have a master branch.

git push origin master

Now, GitHub doesn't allow you to fork a repository you own, and since this mirror is owned by me, where can I do my own hacks and patches? The 'master' branch of course. But I still want an unmolested, straight-from-subversion mirror. So, I create a 'vendor' branch in my workspace. It's initialized to match 'master' exactly.

git checkout -b vendor

Now, I push that to GitHub, too.

git push origin vendor

Awesome. I now have two branches, identical at the moment, called "vendor" and "master".

Now, as far as I can tell, all the Subversion setup that we did only lives in the local repository on my always-on server. Anyone who clones from the GitHub repository will not have that stuff. They can of course do a 'git svn init' themselves, to add it to their local repository. But it doesn't flow through GitHub.

But that's fine, since I've been doing this on my always-on server anyhow. My workspace is sitting in the 'vendor' branch that's tracking the vendor branch from github.

I can pull the latest changes from Subversion by typing

git svn rebase

The 'rebase' command is neat, in that any changes that exist in the git repository are floated to be applied to whatever the latest HEAD is. But since I'm only concerned with a one-way SVN-to-Git mirror, there will never be any changes to float, and this will just tack on subsequent SVN commits as Git commits onto the 'vendor' branch. It'll leave the 'master' branch un-touched.

After rebasing, you gotta push the 'vendor' branch up to GitHub.

git push origin vendor

Now, type that every 15 minutes, and your 'vendor' branch will stay mostly up-to-date.

Or use cron.

I've cronned a script that fires every 15 minutes

#!/bin/sh cd /home/bob/github-svn-mirrors/$1 git svn rebase git push origin vendor

It's run with the repository name as the first (and only) argument

*/15 * * * * /home/bob/github-svn-mirrors/bin/mirror jruby

Now, over on my laptop, finally, I can clone the repository, work on topic branches, push to master and have my own controlled environment and fork, while knowing the 'vendor' branch reflects the pure SVN state which I can also pull into my hackings as-desired.

When I submit a patch, if it ultimately floats back to me through the vendor branch, git is supposedly smart enough to realize that the same changes have arrived in my 'master' (assuming it's applied verbatim) and keep things nice and tidy. Else, I can force a merge, trampling my half-assed patch with the official JRuby code.