[PATCH] D45733: [DAGCombiner] Unfold scalar masked merge if profitable
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 18 09:30:29 PDT 2018
lebedev.ri added a comment.
In https://reviews.llvm.org/D45733#1071051, @spatel wrote:
> In https://reviews.llvm.org/D45733#1071005, @lebedev.ri wrote:
>
> > > Yeah, that is the question, i'm having. I did look at mca output.
> >
> > Here is what MCA says about that for `-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`
> > F5971838: diff.txt <https://reviews.llvm.org/F5971838>
> > Or is this a scheduling info problem?
>
>
> Cool - a chance to poke at llvm-mca! (cc @andreadb and @courbet)
>
> First thing I see is that it's harder to get the sequence we're after on x86 using the basic source premise:
>
> int andandor(int x, int y) {
> __asm volatile("# LLVM-MCA-BEGIN ands");
> int r = (x & 42) | (y & ~42);
> __asm volatile("# LLVM-MCA-END ands");
> return r;
> }
>
> int xorandxor(int x, int y) {
> __asm volatile("# LLVM-MCA-BEGIN xors");
> int r = ((x ^ y) & 42) ^ y;
> __asm volatile("# LLVM-MCA-END xors");
> return r;
> }
>
>
>
> ...because the input param register doesn't match the output result register. We'd have to hack that in asm...or put the code in a loop, but subtract the loop overhead somehow. Things work/look alright to me other than that.
I simply stored the lhs and rhs side of `// CHECK` lines from aarch64's `@in32_constmask` in two local files,
run llvm-mca on each of them, and diffed the output, no clang was involved.
> I don't know AArch that well, but your example is a special-case that may be going wrong. Ie, if we have a bit-string constant like 0xff000000, you could get:
>
> bfxil w0, w1, #0, #24
>
> ...which should certainly be better than:
>
> eor w8, w1, w0
> and w8, w8, #0xff000000
> eor w0, w8, w1
>
>
>
> AArch64 chose to convert to shift + possibly more expensive bfi for the 0x00ffff00 constant though. That's not something that we can account for in generic DAGCombiner, so I'd categorize that as an AArch64-specific bug (either don't use bfi there or fix the scheduling model or fix this up in MI somehow).
Ok, then let's assume until proven otherwise that if mask is a constant, unfolded variant is always better.
I'll unfold it in instcombine (since it seems the https://reviews.llvm.org/D45664 will already match the masked merge pattern, so it would not add much code).
In https://reviews.llvm.org/D45733#1070963, @spatel wrote:
> there's no need to add code bloat to the DAG to handle the pattern unless something in the backend can create this pattern (seems unlikely).
I don't think it will be possible to check that until after the instcombine part has landed, so ok, at least for now i will stop unfolding [constant mask] in dagcombine.
While there, any hint re pattern matchers for this code?
Repository:
rL LLVM
https://reviews.llvm.org/D45733
More information about the llvm-commits
mailing list