[PATCH] D28744: [X86][AVX] Remove "OptForSize" condition from some memory foldings.

Mon Feb 6 02:20:12 PST 2017

aymanmus added a comment.

After consulting an architect about the general problem, this is the answer I got:

For the following sequence:

  vmovss (%rax), %xmm0
  vsqrtss %xmm0, %xmm0, %xmm0

memory folding should be avoided (to avoid generating new read dependency).
But for the following sequence:

  vmovss (%rax), %xmm1
  vsqrtss %xmm1, %xmm0, %xmm1

the memory folded sequence is better performance wise (read dependency is already there).

This applies to all AVX/AVX512 scalar instruction which accepts an extra input operand and copies it's upper part to the upper part of the output operand.
Adding OptForSize in this case, disables the folding of these specific instructions only, in all cases.
While the ideal way of dealing with this is:

1. Distinguishing between the 2 cases (listed above), and then deciding whether to fold or not.
2. Apply this on all AVX/AVX512 scalar instructions with this behavior.

Other instructions with the same behavior (like vscalefss and vreducess) do not have any folding patterns and are not included in the folding tables, which means folding is not allowed at all (we can improve that).
So what I suggest is not committing this patch (even though the OptForSize is there from the wrong reason) in order to avoid performance degradation, and open a bug on this issue.

Do you agree with that?

https://reviews.llvm.org/D28744