[patch] Two fixes to the vpermilvar optimization

Jim Grosbach grosbach at apple.com
Tue Apr 29 11:28:03 PDT 2014


Generally looks good. One question.

+      unsigned Size = C->getNumElements();
+      assert(Size == 8 || Size == 4 || Size == 2);
+      uint32_t Indexes[8];
+
+      // The intrinsics only read one or two bits, clear the rest.

I don’t understand this. Under what circumstances would these bits come in as non-zero?

+      for (unsigned I = 0; I < Size; ++I) {
+	uint32_t Index = C->getElementAsInteger(I) & 0x3;
+	if (II->getIntrinsicID() == Intrinsic::x86_avx_vpermilvar_pd ||
+	    II->getIntrinsicID() == Intrinsic::x86_avx_vpermilvar_pd_256)
+	  Index >>= 1;
+        Indexes[I] = Index;
+      }
On Apr 28, 2014, at 8:05 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:

> When I added instcomine logic to handle vpermilvar's pd and 256
> variants I incorrectly assumed that the intel architecture was more
> regular than what it actually is.
> 
> A coworker pointed out that the _256 variants use indexes into the
> individual 128 bit lanes and while fixing that I noticed I also had to
> mask out unused bits.
> 
> The attached patch fixes both issues.
> 
> Cheers,
> Rafael
> <t.patch>





More information about the llvm-commits mailing list