[PATCH] D45733: [DAGCombiner] Unfold scalar masked merge if profitable
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 18 09:10:31 PDT 2018
spatel added subscribers: courbet, andreadb.
spatel added a comment.
In https://reviews.llvm.org/D45733#1071005, @lebedev.ri wrote:
> > Yeah, that is the question, i'm having. I did look at mca output.
>
> Here is what MCA says about that for `-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`
> F5971838: diff.txt <https://reviews.llvm.org/F5971838>
> Or is this a scheduling info problem?
Cool - a chance to poke at llvm-mca! (cc @andreadb and @courbet)
First thing I see is that it's harder to get the sequence we're after on x86 using the basic source premise:
int andandor(int x, int y) {
__asm volatile("# LLVM-MCA-BEGIN ands");
int r = (x & 42) | (y & ~42);
__asm volatile("# LLVM-MCA-END ands");
return r;
}
int xorandxor(int x, int y) {
__asm volatile("# LLVM-MCA-BEGIN xors");
int r = ((x ^ y) & 42) ^ y;
__asm volatile("# LLVM-MCA-END xors");
return r;
}
...because the input param register doesn't match the output result register. We'd have to hack that in asm...or put the code in a loop, but subtract the loop overhead somehow. Things work/look alright to me other than that.
I don't know AArch that well, but your example is a special-case that may be going wrong. Ie, if we have a bit-string constant like 0xff000000, you could get:
bfxil w0, w1, #0, #24
...which should certainly be better than:
eor w8, w1, w0
and w8, w8, #0xff000000
eor w0, w8, w1
AArch64 chose to convert to shift + possibly more expensive bfi for the 0x00ffff00 constant though. That's not something that we can account for in generic DAGCombiner, so I'd categorize that as an AArch64-specific bug (either don't use bfi there or fix the scheduling model or fix this up in MI somehow).
Repository:
rL LLVM
https://reviews.llvm.org/D45733
More information about the llvm-commits
mailing list