[llvm-dev] monorepo: bad performance when using gitk / git log
James Y Knight via llvm-dev
llvm-dev at lists.llvm.org
Tue Apr 2 11:14:08 PDT 2019
Ah -- awesome news! Sounds like it may be fixed soon.
Indeed, the dates in the early history of the svn repository did jump
around a bit, because "clang" was imported from an external repository in
2007, while it had already been under development for a year.
To be precise, the SVN revisions r38537 through r39730, except for r39142,
were imported into the SVN repository, with their original commit dates.
Those dates are thus out of order compared with surrounding commits. E.g.
r38535 has the date 2007-07-11 08:47:55 +0000, while r38537 is from a year
earlier, 2006-06-18 05:42:02 +0000.
On Tue, Apr 2, 2019 at 11:16 AM Björn Pettersson A <
bjorn.a.pettersson at ericsson.com> wrote:
> I asked about this on git at vger.kernel.org:
>
>
> https://public-inbox.org/git/20190402132756.GB13141@sigill.intra.peff.net/T/#m1fd5da534d39f967a8ce8b3361bc2e00b9214f31
>
>
>
> I’ve already got an answer that we seem to be unlucky with some access
> patterns when doing “git log –parents” in the monorepo,
>
> and that we hit some quadratic analysis of the commit history. Hopefully
> something they can fix (Jeff King already had some ideas).
>
>
>
> *From:* James Y Knight <jyknight at google.com>
> *Sent:* den 27 mars 2019 20:38
> *To:* Björn Pettersson A <bjorn.a.pettersson at ericsson.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] monorepo: bad performance when using gitk / git
> log
>
>
>
> The problem here seems to be due to the combination of specifying
> --parents, and specifying a pathname to filter by. I can certainly
> reproduce a _remarkable_ slowness with that combination from git....
>
>
>
> On my machine:
>
> $ time git log --parents --oneline origin/master > /dev/null
>
> real 0m4.001s
>
>
>
> $ time git log origin/master -- llvm/test/CodeGen/Generic/bswap.ll >
> /dev/null
>
> real 0m5.332s
>
>
>
> $ time git log --parents --oneline origin/master --
> llvm/test/CodeGen/Generic/bswap.ll > /dev/null
>
> real 2m48.944s
>
>
>
> That said, I use gitk frequently, and had not noticed performance issues.
> But, I'd never tried invoking it with a path on the command-line, only with
> ref names, so it's not hitting the bad case.
>
>
>
> Nor have I noted issues with git log, but again, I'd never have run it
> with --parents, so I don't hit this bad case.
>
>
>
> Maybe worth reporting as a possible bug to git? Surely whatever algorithm
> it's using shouldn't be _this_ slow.
>
>
>
> On Wed, Mar 27, 2019 at 9:23 AM Björn Pettersson A via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi!
>
>
>
> Anyone else experiencing performance problems when using the new monorepo?
>
>
>
> My experience is that performance of gitk (and git log) sometimes is
> really bad when working in the monorepo.
>
>
>
> I’ve mainly seen it when using gitk on specific files/directories, but
> since gitk seems to be using “git log --no-color -z --pretty=raw
> --show-notes --parents --boundary HEAD -- <file>” it is possible to observe
> the same thing when using git log.
>
>
>
>
>
> The problem can be seen when creating a brand new commit (with a new file):
>
>
>
> bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project
>
> bash-4.1$ cd llvm-project
>
> bash-4.1$ touch dummy
>
> bash-4.1$ git add dummy
>
> bash-4.1$ git commit -m "test"
>
> [master 6539b74dd0e] test
>
> 1 file changed, 0 insertions(+), 0 deletions(-)
>
> create mode 100644 llvm/dummy
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD -- dummy > /dev/null
>
> 198.37user 0.40system 3:18.67elapsed 100%CPU (0avgtext+0avgdata
> 696456maxresident)k
>
> 0inputs+0outputs (0major+175765minor)pagefaults 0swaps
>
>
>
>
>
> But also when examining older files, here are some tests using the
> monorepo:
>
>
>
> bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project
>
> bash-4.1$ cd llvm-project
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD > /dev/null
>
> 5.15user 0.26system 0:05.42elapsed 99%CPU (0avgtext+0avgdata
> 220344maxresident)k
>
> 0inputs+0outputs (0major+56131minor)pagefaults 0swaps
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD -- README.md > /dev/null
>
> 155.20user 0.34system 2:35.45elapsed 100%CPU (0avgtext+0avgdata
> 636744maxresident)k
>
> 0inputs+0outputs (0major+160862minor)pagefaults 0swaps
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD -- llvm/CODE_OWNERS.TXT > /dev/null
>
> 55.48user 0.34system 0:55.80elapsed 100%CPU (0avgtext+0avgdata
> 690124maxresident)k
>
> 0inputs+0outputs (0major+174196minor)pagefaults 0swaps
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null
>
> 192.97user 0.33system 3:13.19elapsed 100%CPU (0avgtext+0avgdata
> 696496maxresident)k
>
> 0inputs+0outputs (0major+176003minor)pagefaults 0swaps
>
>
>
>
>
> Same tests when using the old llvm repo (there is no README.md so I
> skipped that test here):
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD > /dev/null
>
> 2.72user 0.12system 0:02.84elapsed 99%CPU (0avgtext+0avgdata
> 136628maxresident)k
>
> 0inputs+0outputs (0major+36354minor)pagefaults 0swaps
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD -- CODE_OWNERS.TXT > /dev/null
>
> 2.74user 0.19system 0:02.93elapsed 99%CPU (0avgtext+0avgdata
> 344756maxresident)k
>
> 0inputs+0outputs (0major+88975minor)pagefaults 0swaps
>
>
>
> bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes
> --parents --boundary HEAD -- test/CodeGen/Generic/bswap.ll > /dev/null
>
> 3.76user 0.19system 0:03.96elapsed 99%CPU (0avgtext+0avgdata
> 380416maxresident)k
>
> 0inputs+0outputs (0major+98218minor)pagefaults 0swaps
>
>
>
>
>
> The example with test/CodeGen/Generic/bswap.ll indicates that it can take
> 193/4=48 times longer time to open gitk (or run git log) on a file when
> using the monorepo(!?!?).
>
>
>
> I’m not so familiar with the inner details of git. Could this be a bad
> repack of the llvm-projects repo or something?
>
> Or is it just that we now squeeze so many commits into the same repo that
> I should expect the performance to be even worse in the future?
>
>
>
> The figures above is when using git 2.14.1, but I’ve also tried 2.20.0
> with similar results.
>
>
>
> Regards,
>
> Björn
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190402/af3aaaec/attachment.html>
More information about the llvm-dev
mailing list