[patch] Two fixes to the vpermilvar optimization

Tue Apr 29 13:35:36 PDT 2014

On 29 April 2014 14:28, Jim Grosbach <grosbach at apple.com> wrote:
> Generally looks good. One question.
>
> +      unsigned Size = C->getNumElements();
> +      assert(Size == 8 || Size == 4 || Size == 2);
> +      uint32_t Indexes[8];
> +
> +      // The intrinsics only read one or two bits, clear the rest.
>
> I don’t understand this. Under what circumstances would these bits come in as non-zero?

Something like

declare <2 x double> @llvm.x86.avx.vpermilvar.pd(<2 x double>, <2 x i64>)
define <2 x double> @test_vpermilvar_pd(<2 x double> %v) {
  %a = tail call <2 x double> @llvm.x86.avx.vpermilvar.pd(<2 x double>
%v, <2 x i64> <i64 42, i64 0>)
  ret <2 x double> %a
}

Using a 42 in here has well defined behaviour according to the intel manual.

Cheers,
Rafael