[PATCH] D63246: [X86][SSE] Prevent misaligned non-temporal vector load/store combines

Mon Jun 17 06:09:33 PDT 2019

andreadb accepted this revision.
andreadb added a comment.
This revision is now accepted and ready to land.

Looks good to me.

================
Comment at: test/CodeGen/X86/nontemporal-3.ll:388-393
 ; SSE-NEXT:    xorps %xmm0, %xmm0
-; SSE-NEXT:    movups %xmm0, 16(%rdi)
-; SSE-NEXT:    movups %xmm0, (%rdi)
+; SSE-NEXT:    movaps %xmm0, -{{[0-9]+}}(%rsp)
+; SSE-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
+; SSE-NEXT:    movq -{{[0-9]+}}(%rsp), %rcx
+; SSE-NEXT:    movntiq %rcx, 24(%rdi)
+; SSE-NEXT:    movntiq %rax, 16(%rdi)
----------------
This SSE sequence is clearly sub-optimal.

That being said, I am not too worried about it given how unlucky this scenario is in practice.

If possible, it would be nice to have it fixed in a follow-up patch.
Basically, there is no reason why we should zero XMM0 to then store it on the stack... to then reload its elements on GPRs.. We should just zero a GPR and then have both MOVNTI use it. I suspect this has to do with how we lower certain nodes on SSE.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63246/new/

https://reviews.llvm.org/D63246