[PATCH] D19198: [X86][AVX] Generalized matching for target shuffle combines

Tue May 10 06:46:37 PDT 2016

delena added a comment.

> Elena - I've added a test demonstrating the MOVDDUP combine affecting a masked v16f32 shuffle. As you said this causes the shuffle and the mask move to be split apart. When is this beneficial? Is there a minimum combine depth that you think we should include for combines of EVEX shuffles where the number of vector elements changes? I can add such a depth check if you think it worthwhile.

I'm not sure that it is beneficial on depth 1. Because the sequence "vmovdqa32 + vpermt2ps" allows to move vmovdqa32 up, outside the loop, for example. "vmovddup + vmovaps" are two dependent instructions. Depth 2 and more should be measured, I don't know how to estimate.
I talked with out people about this optimization and I'd suggest to refrain from optimizing masked shuffles on AVX-512 on this stage.

As far as non-masked instructions, an additional issue may be in folding loads. Could you, please, check what happens with folding loads when you change VT?

> if (SrcVT.is256BitVector()) {

The target may be skylake-avx512 (HasVLX)  where we still have to check masks.

> If you prefer I can change the order of this work - reduce this patch to just pulling out the existing shuffle matching code, then future patches will deal with support matching permute (and broadcast / insertps / etc.) and 256/512 bit vectors?

The review will be easier, of course.

Repository:
  rL LLVM

http://reviews.llvm.org/D19198