[PATCH] [X86][AVX] Fix wrong lowering of VPERM2X128 nodes.

Andrea Di Biagio Andrea_DiBiagio at sn.scee.net
Sun Mar 8 07:10:13 PDT 2015

Hi Michael

Comment at: lib/Target/X86/X86ISelLowering.cpp:9090
@@ -9089,2 +9089,3 @@
   // FIXME: Detect zero-vector inputs and use the VPERM2X128 to zero that half.
-  unsigned PermMask = Mask[0] / 2 | (Mask[2] / 2) << 4;
+  int MaskLO = Mask[0] == SM_SentinelUndef ? Mask[1] : Mask[0];
+  int MaskHI = Mask[2] == SM_SentinelUndef ? Mask[3] : Mask[2];
mkuper wrote:
> Can we end up with both mask elements being 'u'? E.g. <u, u, 0, 1>?
> Or will all these cases be caught by different code paths?
> Not that it really matters in practice, since even if it ends up here we'll just end up with -1 / 2 == 0 as the VPERM mask, but I don't think we want to the mask to depend on the numerical value of SentinelUndef.
Yes, we can end up with both mask elements being undef.
That would be legal according to function 'canWidenShuffleElements'.
In practice, as you said, it won't really matter as we would end up propagating index 0, which is still OK considering that it is undef.

I added explicit checks against SentinelUndef because of the FIXME message at line 9089. Basically, at some point, we may want to also check for SM_SentinelZero and use a different strategy for that.



More information about the llvm-commits mailing list