[cairo] Plans (and motivation) for moving cairo from CVS to git

Tue Feb 7 12:19:10 PST 2006

On Mon, 6 Feb 2006 20:47:37 -0600, Jonathon Jongsma wrote:
> On 2/6/06, Carl Worth <cworth at cworth.org> wrote:
> > PS. I'm quite certain I'll be moving cairo to a distributed version
> > control system quite soon (and git in particular).
> 
> I'm glad to hear you're thinking about moving away from CVS (and I'd
> support this for cairomm as well).  I'm curious to hear why you're
> leaning toward git as opposed to, say, bzr or mercurial.  I can't say
> I have a lot of in-depth experience with all of the options, but it I
> remember thinking that git had a bit higher learning curve than some
> of the others the last time I looked into it.

[If you'll pardon me for re-parenting the thread---I thought others
that might be interested in the topic might have missed it under the
previous parent. "Re: [cairo] cvs account for Jonathan Jongsma"]

[And also, please pardon the length of this message. But you've hit on
something I've been putting a lot of thought into lately, so I'm going
to get a lot out at once here. If you don't want to read all of this,
(and why would you?), you can just take some assurance that the
decision to use git, even if not "correct" at least had a lot of
thought put into it.]

I've been wanting to move away from CVS for a long time, (more on this
below), so I have been keeping an eye on the various projects for
quite some time.

As for "why git?" I should first point out that at some level git,
mercurial, and bzr are basically equivalent. Each is built on a
basically similar model of distributed source code management. This is
very encouraging in terms of validating the model, and in providing
assurance that switching from one to another in the future should be
quite easy.

So, to some extent, it doesn't really matter which one we choose at
this point. If we go with git today and then decide next year that
mercurial is obviously better, the git->mercurial transition will be
much less painful than the cvs->git transition. (I can say this
with confidence after already spending a lot of time and effort
getting cvs->git scripts to work and very little effort getting
git->mercurial scripts to work.)

To actually answer the question of why I'm choosing git, though, let
me explain some CVS defects and my current workarounds, and then
examine how git and mercurial would help.

CVS cannot do "disconnected operation"
--------------------------------------
CVS has an inseparable notion of a central server that it always wants
to connect to. I work around this bug by regularly making an rsync
copy of the central CVS repository, then using the local copy for
things like "cvs update" and "cvs diff", and switching to the central
repository for "cvs commit". That's pretty awkward, but I've got my
hacked up scripts to simplify it.

What I'm still missing with CVS is the ability to do offline
commits. This means that if I code for several hours on a plane, say,
I end up with one big patch instead of the sequence of commits that I
would prefer to have.

Either git or mercurial would solve this problem just fine by always
working with a local repository and supporting the ability to push and
pull changes between repositories.

CVS requires central granting of commit access
----------------------------------------------
With CVS, new coders don't get access to source control management
tools until they are granted commit access to a central
repository. Before this, even things like "cvs diff" don't work
correctly. This impedes collaboration, and as in the "disconnected"
case above, leads to bigger patches that are harder to
review/integrate later.

I haven't had a workaround for this, and delays in the granting of
commit access have caused a fair amount of pain to new programmers.

Either git or mercurial would solve this problem quite well. New
coders can easily clone any repository and start working/committing
immediately. Full access to the tools is readily available without any
central grant of permission.

CVS branches are "hard"
-----------------------
Within cairo's history it's quite evident that CVS branches are
entirely inadequate. We've been through several periods of temporary
destabilization or side-development, (API shakeup, new font support,
new font support again, etc.), yet we've never used CVS branches for
any of that. The only branch we have is the branch for 1.0 maintenance
releases, (and even then, CVS "support" for branching is so limited
that we do painful, simultaneous commits to both branches rather than
ever using any assisted merging). What a pain!

For quite some time, I've been using a local workaround for the lack
of good branch support. What I do is locally "cp -a" a fresh checkout
of cairo from CVS every time I want to effectively branch. I do this
every time I start working on a bug report from bugzilla, (make a test
case, merge a patch, etc.), so I have the following directories on my
local hard drive, for example (corresponding to bugzilla numbers):

	cairo-4263
	cairo-4299
	cairo-4320
	cairo-4339
	cairo-4599
	cairo-4863
	cairo-5100
	cairo-5289
	cairo-5495
	cairo-5518

Then I also do the same thing every time I want to work on a new
feature. For example, some of the other "branches" I have locally are:

	# Some initial work on the "new tessellator"
	cairo-bentley-ottmann

	# Make it easy to draw "dots" made of round caps
	cairo-degenerate-stroke

	# Support from vlad for device_offset testing in the test suite
	cairo-device-offset-testing

	# Proposed cairo_new_sub_path functionality
	cairo-new-sub-path

	# An odd bug I noticed once
	cairo-stroked-spline-turning-fast

	etc. etc.

I have about 40-50 such local branches, (some of which are admittedly
stale and just need to go away). This workaround is convenient in that
I can hold on to a merged patch from a testcase, say, and still get
CVS to help merge that together with mainline changes with "cvs
update". However, there are two critical failings with this approach:

1) I can't do any commits along any of these branches. Once again,
   we're back to the one-giant-patch problem that is common to all of
   these defects.

2) These are all local-only branches, in that they only exist on my
   hard drive. This has been a huge impediment to development. Nobody
   gets visibility to these branches as they are in progress. (They
   might not even know I'm working on things until I get finished and
   mail out a patch.)

   This causes simple problems like duplication of effort in merging
   bug fix patches. It also causes big problems like the fact that we
   don't have lots of people working together on finishing up the new
   tessellator.

   I'd much rather just publish all this stuff to the world, let
   people look at it, and play with it, commit their own pieces on
   top, and then offer them back again.

This final CVS defect is one in which I think git does a much better
job than mercurial at providing what I want. Either one would be much
better than the current limitations of CVS. But with git, users would
be able to pull down "Carl's current cairo" and examine all of the
stuff I currently have in flight within git itself, (with things like
graphical repository browsers gitk or gitview, "git branch" to list
the available branches, or with query tools like "git grep" that can
quickly(!) search through all branches in the repository).

On the other hand, the mercurial model would be closer to what I'm
currently doing, where each "branch" would be a separate cloned
repository. As far I understand, there wouldn't be any means for a
user to pull down "everything Carl is working on" and users would
instead have to choose, in advance, a single branch to
examine. Mercurial still provides a graphical repository browser, but
it would then be limited to a single branch. There doesn't even exist,
(that I can find), anything like "git branch" to list multiple lines
of development that currently exist in a repository.

So there's one conceptual piece missing from mercurial's model
here. In addition to this carl-wants-to-advertise-his-ADHD problem, I
think there's another significant problem from mercurial's
one-branch-per-repository model. Namely, it appear to force a
more centralized, (or at least, a more strictly hierarchical), model
on the development process, while git allows a more fully distributed
model making it easier for users to pull (even speculatively with
"fetch") from multiple sources, track them in the local repository as
separate branches and merge when appropriate.

I think cairo development, (particular the development of "niche"
backends and the exploration of proposed new features), could
definitely take advantage of the more distributed model afforded by
git.

The mercurial development has benefitted substantially from the prior
existence of git, including the benefit of hindsight. I think any
shortcomings I mention above are at most lack of maturity and not any
fundamental shortcoming that couldn't be overcome. Again, as I said in
the beginning, the tools are largely similar, and we could switch
easily in the future.

There are some other detailed differences between git and mercurial
such as the storage formats and implications on robustness,
performance, and storage costs. I spent a lot of time investigating
these and concluding that it was a wash. There are advantages and
disadvantages in either case, but I think either is likely good
enough.

Finally, there are questions regarding the user-interface and how
easy-to-learn each tool is. Mercurial is definitely easy to learn, but
as I mentioned above, I've found things I'd like to do with mercurial
that I can't do. And at that point, I'm stuck.

Git, on the other hand, provides a layered approach, where the most
fundamental operations on the repository are exported as separate
commands. This can be daunting, (try typing "git-" and press TAB for
example), but it turns out that git really can be as easy to learn as
mercurial. See, for example:

	A tutorial introduction to git
	http://www.kernel.org/pub/software/scm/git/docs/tutorial.html

The layered approach is primarily implemented with a core in C, and
shell scripts above that. The reason this is a nice feature, and not
just a potential way for newcomers to be overwhelmed, is that the
user-interface is actually very easy to adapt to specific project or
individual needs---while still maintaining the ability to collaborate
on the repository itself.

The structure of git is such that it would actually be quite feasible
to implement the hg user-interface on top of git's core. And I for
one, would definitely be interested in see source control
user-interface experimentation start happening with a common
repository/protocol/core-command-set rather than each separate
interface also coming bundled with a separate repository
structure/protocol.

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20060207/9004487f/attachment.pgp