[patch] Two fixes to the vpermilvar optimization
Jim Grosbach
grosbach at apple.com
Tue Apr 29 11:28:03 PDT 2014
Generally looks good. One question.
+ unsigned Size = C->getNumElements();
+ assert(Size == 8 || Size == 4 || Size == 2);
+ uint32_t Indexes[8];
+
+ // The intrinsics only read one or two bits, clear the rest.
I don’t understand this. Under what circumstances would these bits come in as non-zero?
+ for (unsigned I = 0; I < Size; ++I) {
+ uint32_t Index = C->getElementAsInteger(I) & 0x3;
+ if (II->getIntrinsicID() == Intrinsic::x86_avx_vpermilvar_pd ||
+ II->getIntrinsicID() == Intrinsic::x86_avx_vpermilvar_pd_256)
+ Index >>= 1;
+ Indexes[I] = Index;
+ }
On Apr 28, 2014, at 8:05 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> When I added instcomine logic to handle vpermilvar's pd and 256
> variants I incorrectly assumed that the intel architecture was more
> regular than what it actually is.
>
> A coworker pointed out that the _256 variants use indexes into the
> individual 128 bit lanes and while fixing that I noticed I also had to
> mask out unused bits.
>
> The attached patch fixes both issues.
>
> Cheers,
> Rafael
> <t.patch>
More information about the llvm-commits
mailing list