[PATCH] D28744: [X86][AVX] Remove "OptForSize" condition from some memory foldings.
Ayman Musa via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 6 02:20:12 PST 2017
aymanmus added a comment.
After consulting an architect about the general problem, this is the answer I got:
For the following sequence:
vmovss (%rax), %xmm0
vsqrtss %xmm0, %xmm0, %xmm0
memory folding should be avoided (to avoid generating new read dependency).
But for the following sequence:
vmovss (%rax), %xmm1
vsqrtss %xmm1, %xmm0, %xmm1
the memory folded sequence is better performance wise (read dependency is already there).
This applies to all AVX/AVX512 scalar instruction which accepts an extra input operand and copies it's upper part to the upper part of the output operand.
Adding OptForSize in this case, disables the folding of these specific instructions only, in all cases.
While the ideal way of dealing with this is:
1. Distinguishing between the 2 cases (listed above), and then deciding whether to fold or not.
2. Apply this on all AVX/AVX512 scalar instructions with this behavior.
Other instructions with the same behavior (like vscalefss and vreducess) do not have any folding patterns and are not included in the folding tables, which means folding is not allowed at all (we can improve that).
So what I suggest is not committing this patch (even though the OptForSize is there from the wrong reason) in order to avoid performance degradation, and open a bug on this issue.
Do you agree with that?
https://reviews.llvm.org/D28744
More information about the llvm-commits
mailing list