[LLVMdev] SVN dump seed file (was: svnsync of llvm tree)

Fri Feb 27 02:27:07 PST 2015

Hi folks,

in a rather old thread on this list titled "svnsync of llvm tree"
<http://comments.gmane.org/gmane.comp.compilers.llvm.devel/42523> we
noticed that an svnsync would fail due to a few particularly big commits
that apparently caused OOM conditions on the server. The error and the
revision number were consistent for different people.

That seems to be fixed now. I succeeded in pulling a full "clone" of the
SVN repository.

Back then it was noted that a SVN dump seed file could make it easier
for people to start svnsync-ing the LLVM source tree. Now, I'm hoping to
convince you to provide a seed file for another reason than just the
error from back then.

To my knowledge when using svnsync (just like svn  one actually
transfers approximately what the svndump file size of the resulting
repo. Uncompressed that happens to be almost 15 Gigabytes, even though
the resulting SVN repository is only roughly 3.8 Gigabytes in size. For
this experiment I dumped the revision range 0:230000 to a file just to
have a "clean" range.

After compressing the dump with the lzma utility I had a file of 419
Megabytes.

If there is interest, I can upload the compressed dump (or make it
available for download), including a PGP signature on the files, and so
on for you to make available on the official servers. I'll even add
detailed steps on how to get to a repository again from there and how to
keep it up-to-date. And yes, I adjusted the repo UUID to match the
remote one (which is utterly useful to keep a local synchronized repo
and 'svn relocate' between upstream and the synchronized one depending
on availability or mood).

All I really did was:

  svnadmin dump $(pwd) -r 0:230000 > llvm.svndump

from my svnsync "clone" with the already adjusted UUID. And then "lzma
-k9e" the resulting dump file (the compression took more than 2 hours).

The main point is, that anyone trying to start synchronizing now will
have to transfer ~15 GiB of data to get to the current point. That can
be cut to ~420 MiB by providing a seed file, in the described case for
the revision range 0:230000 (and additional chunks such as 230000:250000
could be added later, or the base seed file could be updated accordingly).

Hope someone in charge reads and considers this.

With best regards,

Oliver

PS: feel free to contact me off-list about that, too.