[PATCH] D36922: [LibCallSimplifier] try harder to fold memcmp with constant arguments

Sat Aug 19 10:13:18 PDT 2017

spatel added a comment.

In https://reviews.llvm.org/D36922#846492, @davide wrote:

> In https://reviews.llvm.org/D36922#846489, @spatel wrote:
>
> > In https://reviews.llvm.org/D36922#846488, @davide wrote:
> >
> > > In https://reviews.llvm.org/D36922#846477, @spatel wrote:
> > >
> > > > In https://reviews.llvm.org/D36922#846470, @davide wrote:
> > > >
> > > > > What GCC does in this case? (or icc?)
> > > >
> > > >
> > > > https://godbolt.org/g/oKf3s8
> > > >
> > > > So gcc gets it; icc did something terrible; clang also gets this, but in the x86 backend. This is moving the optimization up in the pipeline to hopefully combine with a common prefix optimization suggested by Joerg in https://reviews.llvm.org/D35035.
> > >
> > >
> > > Do you have an example of a common prefix opt that fires if we perform this lowering as part of the instruction combiner?
> >
> >
> > No. AFAIK, no such common prefix optimization exists in llvm.
>
>
> So, maybe we might consider postponing this change until a real use-case shows up?
>  This doesn't seem to be a lot of code to add when/if a real opportunity shows up.

I'm not sure why that's better. If you're advocating that we should remove early memcmp transforms entirely, then the burden of proof for that change is much higher as I understand it. But this patch is just trying to fix a logic hole in the existing transform.

The motivation for fixing this now is that there are cases where earlier analysis/expansion of small memcmp would be a perf win, so I'd like to remove this roadblock from affecting work on that going forward.

A potential example of that is the attachment in PR34032 (that's the optimized IR from an llvm source file - lib/IR/Function.cpp). The IR has ~5500 memcmp calls in it, and even in its current limited form for x86, CGP memcmp expansion will transform over 4K of those calls to inline IR (ie, the majority of memcmps are constant length and less than 16 bytes). If you then run that IR through the normal opt -O2 pipeline, it gets significantly smaller, so I'm taking that as early evidence that there's room for improvement.

https://reviews.llvm.org/D36922