[PATCH] D20965: [X86][SSE] Add general lowering of nontemporal vector loads
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 7 06:28:47 PDT 2016
RKSimon added inline comments.
================
Comment at: lib/Target/X86/X86InstrAVX512.td:3378
@@ -3340,1 +3377,3 @@
+ def : Pat<(v16i8 (alignednontemporalload addr:$src)),
+ (VMOVNTDQAZ128rm addr:$src)>;
}
----------------
craig.topper wrote:
> Aren't 128/256 integer loads still promoted to v2i64 and v4i64 even when AVX512 is enabled?
No - if I remove the i32/i16/i8 patterns then the nt loads don't happen - I haven't been able to work out why.
================
Comment at: test/CodeGen/X86/fast-isel-nontemporal.ll:599
@@ +598,3 @@
+; AVX1: # BB#0: # %entry
+; AVX1-NEXT: vmovdqa (%rdi), %ymm0
+; AVX1-NEXT: retq
----------------
mkuper wrote:
> I wonder if this is better or worse, in practice, than 2 * vmovntdqa %xmm.
Its worse - if you're wanting to use NT loads you must have a good reason. I'll look at ways to split this in a future patch.
Repository:
rL LLVM
http://reviews.llvm.org/D20965
More information about the llvm-commits
mailing list