[PATCH] D33728: [X86][SSE] Improve handling of non-temporal aligned loads
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 5 07:48:44 PDT 2017
RKSimon added inline comments.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:32303
// For chips with slow 32-byte unaligned loads, break the 32-byte operation
- // into two 16-byte operations.
+ // into two 16-byte operations. Also split non-temporal aligned loads on AVX1
+ // targets as 32-byte loads will lower to regular temporal loads.
----------------
filcab wrote:
> "pre-AVX2" (or "targets without AVX2"), no? I'd expect this to also happen on SSE4.1 (also has 128bit NT loads).
>
Yes, it's just the comment that needs fixing - !hasInt256() will already handle all SSE/AVX1 CPUs
Repository:
rL LLVM
https://reviews.llvm.org/D33728
More information about the llvm-commits
mailing list