[PATCH] D33728: [X86][SSE] Improve handling of non-temporal aligned loads

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 5 07:48:44 PDT 2017


RKSimon added inline comments.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:32303
   // For chips with slow 32-byte unaligned loads, break the 32-byte operation
-  // into two 16-byte operations.
+  // into two 16-byte operations. Also split non-temporal aligned loads on AVX1
+  // targets as 32-byte loads will lower to regular temporal loads.
----------------
filcab wrote:
> "pre-AVX2" (or "targets without AVX2"), no? I'd expect this to also happen on SSE4.1 (also has 128bit NT loads).
> 
Yes, it's just the comment that needs fixing - !hasInt256() will already handle all SSE/AVX1 CPUs


Repository:
  rL LLVM

https://reviews.llvm.org/D33728





More information about the llvm-commits mailing list