[PATCH] D20965: [X86][SSE] Add general lowering of nontemporal vector loads

Tue Jun 7 06:28:47 PDT 2016

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86InstrAVX512.td:3378
@@ -3340,1 +3377,3 @@
+  def : Pat<(v16i8 (alignednontemporalload addr:$src)),
+            (VMOVNTDQAZ128rm addr:$src)>;
 }
----------------
craig.topper wrote:
> Aren't 128/256 integer loads still promoted to v2i64 and v4i64 even when AVX512 is enabled?
No - if I remove the i32/i16/i8 patterns then the nt loads don't happen - I haven't been able to work out why.

================
Comment at: test/CodeGen/X86/fast-isel-nontemporal.ll:599
@@ +598,3 @@
+; AVX1:       # BB#0: # %entry
+; AVX1-NEXT:    vmovdqa (%rdi), %ymm0
+; AVX1-NEXT:    retq
----------------
mkuper wrote:
> I wonder if this is better or worse, in practice, than 2 * vmovntdqa %xmm.
Its worse - if you're wanting to use NT loads you must have a good reason. I'll look at ways to split this in a future patch.


Repository:
  rL LLVM

http://reviews.llvm.org/D20965