[libcxx-commits] [PATCH] D63063: Bug 42208: speeding up std::merge

Denis Yaroshevskiy via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Mon Jun 10 13:44:16 PDT 2019


dyaroshev added a comment.

In D63063#1536732 <https://reviews.llvm.org/D63063#1536732>, @lebedev.ri wrote:

> In D63063#1536710 <https://reviews.llvm.org/D63063#1536710>, @dyaroshev wrote:
>
> > In D63063#1535924 <https://reviews.llvm.org/D63063#1535924>, @lebedev.ri wrote:
> >
> > > Have you analyzed, how much is this a problem of the actual implementation (in libc++), and how much of the llvm optimization passes?
> > >  I.e. are there some obvious failures of the llvm opts?
> >
> >
> > I do not know how to get those. What do I look at?
>
>
> Well, if "hey tool, give me all the optimizations that could be done here" was that easy :/
>
> - Compile the code to `.s` (`-c -S -o test.s`), and analyze the produced assembly, interpret it "with your mind" and think if some particular assembly instruction sequences can be optimized into other, better/shorter/etc, assembly instruction sequences. This one is cpu architecture, cpu model/version specific, obviously.
> - Compile the code to LLVM IR (`-c -emit-llvm -S -o test.ll`), and do the same at IR level.


Oh - I see what you mean - i thought I could do it optimisation by optimisation or smth like that.
I did look at the assembly.
The only thing that seems like an optimiser bug is that it doesn't collapse multiple calls to memmoves into one for std::merge and for my version it does.
Other then that - in this case the code is 1 to 1 what is written.

There are two wins here:

1. I only check one boundary per iteration instead of two.
2. I restructure the loop so that it looks like a nice unrolled loop for the case when there are two elements from the first range and that it looks like a 'do while' loop for the second range.

It is purely a code layout trick that proves to yield very nice results. But I don't think that it constitutes a bug that optimiser doesn't do it.

I also played a bit with the switch statement - so far seems like the optimiser refuses transform my switch into jumps and insists on keeping it in.
That might be a bug.

There is a godbolt link in the pr description.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63063/new/

https://reviews.llvm.org/D63063





More information about the libcxx-commits mailing list