[llvm-dev] monorepo: bad performance when using gitk / git log

Björn Pettersson A via llvm-dev llvm-dev at lists.llvm.org
Tue Apr 2 08:16:01 PDT 2019


I asked about this on git at vger.kernel.org:
   https://public-inbox.org/git/20190402132756.GB13141@sigill.intra.peff.net/T/#m1fd5da534d39f967a8ce8b3361bc2e00b9214f31

I’ve already got an answer that we seem to be unlucky with some access patterns when doing “git log –parents” in the monorepo,
and that we hit some quadratic analysis of the commit history. Hopefully something they can fix (Jeff King already had some ideas).

From: James Y Knight <jyknight at google.com>
Sent: den 27 mars 2019 20:38
To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] monorepo: bad performance when using gitk / git log

The problem here seems to be due to the combination of specifying  --parents, and specifying a pathname to filter by. I can certainly reproduce a _remarkable_ slowness with that combination from git....

On my machine:
$ time git log --parents --oneline origin/master > /dev/null
real    0m4.001s

$ time git log origin/master -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null
real    0m5.332s

$ time git log --parents --oneline origin/master -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null
real    2m48.944s

That said, I use gitk frequently, and had not noticed performance issues. But, I'd never tried invoking it with a path on the command-line, only with ref names, so it's not hitting the bad case.

Nor have I noted issues with git log, but again, I'd never have run it with --parents, so I don't hit this bad case.

Maybe worth reporting as a possible bug to git? Surely whatever algorithm it's using shouldn't be _this_ slow.

On Wed, Mar 27, 2019 at 9:23 AM Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi!

Anyone else experiencing performance problems when using the new monorepo?

My experience is that performance of gitk (and git log) sometimes is really bad when working in the monorepo.

I’ve mainly seen it when using gitk on specific files/directories, but since gitk seems to be using “git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- <file>” it is possible to observe the same thing when using git log.


The problem can be seen when creating a brand new commit (with a new file):

bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project
bash-4.1$ cd llvm-project
bash-4.1$ touch dummy
bash-4.1$ git add dummy
bash-4.1$ git commit -m "test"
[master 6539b74dd0e] test
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 llvm/dummy
bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD  -- dummy > /dev/null
198.37user 0.40system 3:18.67elapsed 100%CPU (0avgtext+0avgdata 696456maxresident)k
0inputs+0outputs (0major+175765minor)pagefaults 0swaps


But also when examining older files, here are some tests using the monorepo:

bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project
bash-4.1$ cd llvm-project

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD > /dev/null
5.15user 0.26system 0:05.42elapsed 99%CPU (0avgtext+0avgdata 220344maxresident)k
0inputs+0outputs (0major+56131minor)pagefaults 0swaps

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD  -- README.md > /dev/null
155.20user 0.34system 2:35.45elapsed 100%CPU (0avgtext+0avgdata 636744maxresident)k
0inputs+0outputs (0major+160862minor)pagefaults 0swaps

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD  -- llvm/CODE_OWNERS.TXT > /dev/null
55.48user 0.34system 0:55.80elapsed 100%CPU (0avgtext+0avgdata 690124maxresident)k
0inputs+0outputs (0major+174196minor)pagefaults 0swaps

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD  -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null
192.97user 0.33system 3:13.19elapsed 100%CPU (0avgtext+0avgdata 696496maxresident)k
0inputs+0outputs (0major+176003minor)pagefaults 0swaps


Same tests when using the old llvm repo (there is no README.md so I skipped that test here):

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD > /dev/null
2.72user 0.12system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 136628maxresident)k
0inputs+0outputs (0major+36354minor)pagefaults 0swaps

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD  -- CODE_OWNERS.TXT > /dev/null
2.74user 0.19system 0:02.93elapsed 99%CPU (0avgtext+0avgdata 344756maxresident)k
0inputs+0outputs (0major+88975minor)pagefaults 0swaps

bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD  -- test/CodeGen/Generic/bswap.ll > /dev/null
3.76user 0.19system 0:03.96elapsed 99%CPU (0avgtext+0avgdata 380416maxresident)k
0inputs+0outputs (0major+98218minor)pagefaults 0swaps


The example with test/CodeGen/Generic/bswap.ll  indicates that it can take 193/4=48 times longer time to open gitk (or run git log) on a file when using the monorepo(!?!?).

I’m not so familiar with the inner details of git. Could this be a bad repack of the llvm-projects repo or something?
Or is it just that we now squeeze so many commits into the same repo that I should expect the performance to be even worse in the future?

The figures above is when using git 2.14.1, but I’ve also tried 2.20.0 with similar results.

Regards,
Björn
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190402/e7d67854/attachment.html>


More information about the llvm-dev mailing list