[PATCH] D33728: [X86][SSE] Improve handling of non-temporal aligned loads

Mon Jun 5 09:13:46 PDT 2017

craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/nontemporal-loads.ll:642
 ; SSE:       # BB#0:
 ; SSE-NEXT:    addps (%rdi), %xmm0
 ; SSE-NEXT:    retq
----------------
RKSimon wrote:
> craig.topper wrote:
> > Why is sse4.1 still folding here? Is this because sse4.1 patterns uses memopv4f32 and not loadv4f32?
> Ah - missed that one - yes its because its using the 'SSE-only' memory fragments. 
I've wondered for a while if we shouldn't have a converged memop that lifts the alignment restriction if AVX is enabled. I think that would shrink the size of the DAG isel table because AVX and SSE would use the same predicate. The only issue I know of is that SHA1 instructions use memop, but don't have a VEX equivalent documented yet so must always obey the alignment even when AVX is enabled.

Repository:
  rL LLVM

https://reviews.llvm.org/D33728