It seems that git-annex copies every individual file in a separate transaction. This is quite costly for mass transfers: each file involves a separate rsync invocation and the creation of a new commit. Even with a meager thousand files or so in the annex, I have to wait for fifteen minutes to copy the contents to another disk, simply because every individual file involves some disk thrashing. Also, it seems suspicious that the git-annex branch would get a thousands commits of history from the simple procedure of copying everything to a new repository. Surely it would be better to first copy everything and then create only a single commit that registers the changes to the files' availability?
git-annex is very careful to commit as infrequently as possible, and the current version makes 1 commit after all the copies are complete, even if it transferred a billion files. The only overhead incurred for each file is writing a journal file. You must have an old version. --Joey
(I'm also not quite clear on why rsync is being used when both repositories are local. It seems to be just overhead.)
Even when copying to another disk it's often on some slow bus, and the file is by definition large. So it's nice to support resumes of interrupted transfers of files. Also because rsync has a handy progress display that is hard to get with cp.
(However, if the copy is to another directory in the same disk, it does use cp, and even supports really fast copies on COW filesystems.) --Joey
Oneshot mode is now implemented, making git-annex-shell and other short lifetime processes not bother with committing changes. done --Joey
Update: Now it makes one commit at the very end of such a mass transfer. --Joey