[PATCH] [X86][AVX] Fix wrong lowering of VPERM2X128 nodes.
Andrea Di Biagio
Andrea_DiBiagio at sn.scee.net
Sun Mar 8 07:10:13 PDT 2015
Hi Michael
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:9090
@@ -9089,2 +9089,3 @@
// FIXME: Detect zero-vector inputs and use the VPERM2X128 to zero that half.
- unsigned PermMask = Mask[0] / 2 | (Mask[2] / 2) << 4;
+ int MaskLO = Mask[0] == SM_SentinelUndef ? Mask[1] : Mask[0];
+ int MaskHI = Mask[2] == SM_SentinelUndef ? Mask[3] : Mask[2];
----------------
mkuper wrote:
> Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>?
> Or will all these cases be caught by different code paths?
>
> Not that it really matters in practice, since even if it ends up here we'll just end up with -1 / 2 == 0 as the VPERM mask, but I don't think we want to the mask to depend on the numerical value of SentinelUndef.
Yes, we can end up with both mask elements being undef.
That would be legal according to function 'canWidenShuffleElements'.
In practice, as you said, it won't really matter as we would end up propagating index 0, which is still OK considering that it is undef.
I added explicit checks against SentinelUndef because of the FIXME message at line 9089. Basically, at some point, we may want to also check for SM_SentinelZero and use a different strategy for that.
http://reviews.llvm.org/D8119
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list