[PATCH] D12288: make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 24 09:06:34 PDT 2015
spatel created this revision.
spatel added reviewers: zansari, chandlerc, qcolombet, RKSimon, silvas.
spatel added a subscriber: llvm-commits.
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Currently, we use different store types depending on whether the example can be lowered as a memset or not.
http://reviews.llvm.org/D12288
Files:
lib/Target/X86/X86Subtarget.cpp
test/CodeGen/X86/slow-unaligned-mem.ll
Index: test/CodeGen/X86/slow-unaligned-mem.ll
===================================================================
--- test/CodeGen/X86/slow-unaligned-mem.ll
+++ test/CodeGen/X86/slow-unaligned-mem.ll
@@ -55,6 +55,11 @@
; Slow chips use 4-byte stores. Fast chips with SSE or later use something other than 4-byte stores.
; Chips that don't have SSE use 4-byte stores either way, so they're not tested.
+; Also verify that SSE4.2 or SSE4a imply fast unaligned accesses.
+
+; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4.2 2>&1 | FileCheck %s --check-prefix=FAST
+; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4a 2>&1 | FileCheck %s --check-prefix=FAST
+
define void @store_zeros(i8* %a) {
; SLOW-NOT: not a recognized processor
; SLOW-LABEL: store_zeros:
Index: lib/Target/X86/X86Subtarget.cpp
===================================================================
--- lib/Target/X86/X86Subtarget.cpp
+++ lib/Target/X86/X86Subtarget.cpp
@@ -192,6 +192,13 @@
// Parse features string and set the CPU.
ParseSubtargetFeatures(CPUName, FullFS);
+ // All CPUs that implement SSE4.2 or SSE4A support unaligned accesses of
+ // 16-bytes and under that are reasonably fast. These features were
+ // introduced with Intel's Nehalem and AMD's Family10h micro-architectures
+ // respectively.
+ if (hasSSE42() || hasSSE4A())
+ IsUAMemUnder32Slow = false;
+
InstrItins = getInstrItineraryForCPU(CPUName);
// It's important to keep the MCSubtargetInfo feature bits in sync with
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D12288.32957.patch
Type: text/x-patch
Size: 1527 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150824/48d83795/attachment.bin>
More information about the llvm-commits
mailing list