[PATCH] D33728: [X86][SSE] Improve handling of non-temporal aligned loads

Mon Jun 5 07:07:58 PDT 2017

filcab added a comment.

LGTM with a minor comment nit (if I'm right).
Code expansion is annoying, but it becomes closer to source semantics.
Thanks!
Filipe

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:32303
   // For chips with slow 32-byte unaligned loads, break the 32-byte operation
-  // into two 16-byte operations.
+  // into two 16-byte operations. Also split non-temporal aligned loads on AVX1
+  // targets as 32-byte loads will lower to regular temporal loads.
----------------
"pre-AVX2" (or "targets without AVX2"), no? I'd expect this to also happen on SSE4.1 (also has 128bit NT loads).

Repository:
  rL LLVM

https://reviews.llvm.org/D33728