[llvm] r245950 - make fast unaligned memory accesses implicit with SSE4.2 or SSE4a

Tue Aug 25 09:29:22 PDT 2015

Author: spatel
Date: Tue Aug 25 11:29:21 2015
New Revision: 245950

URL: http://llvm.org/viewvc/llvm-project?rev=245950&view=rev
Log:
make fast unaligned memory accesses implicit with SSE4.2 or SSE4a

This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.

Differential Revision: http://reviews.llvm.org/D12288


Modified:
    llvm/trunk/lib/Target/X86/X86Subtarget.cpp
    llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll

Modified: llvm/trunk/lib/Target/X86/X86Subtarget.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.cpp?rev=245950&r1=245949&r2=245950&view=diff
==============================================================================

--- llvm/trunk/lib/Target/X86/X86Subtarget.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86Subtarget.cpp Tue Aug 25 11:29:21 2015
@@ -192,6 +192,13 @@ void X86Subtarget::initSubtargetFeatures
   // Parse features string and set the CPU.
   ParseSubtargetFeatures(CPUName, FullFS);
 
+  // All CPUs that implement SSE4.2 or SSE4A support unaligned accesses of
+  // 16-bytes and under that are reasonably fast. These features were
+  // introduced with Intel's Nehalem/Silvermont and AMD's Family10h
+  // micro-architectures respectively.
+  if (hasSSE42() || hasSSE4A())
+    IsUAMemUnder32Slow = false;
+  
   InstrItins = getInstrItineraryForCPU(CPUName);
 
   // It's important to keep the MCSubtargetInfo feature bits in sync with

Modified: llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll?rev=245950&r1=245949&r2=245950&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll (original)
+++ llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll Tue Aug 25 11:29:21 2015
@@ -55,6 +55,11 @@
 ; Slow chips use 4-byte stores. Fast chips with SSE or later use something other than 4-byte stores.
 ; Chips that don't have SSE use 4-byte stores either way, so they're not tested.
 
+; Also verify that SSE4.2 or SSE4a imply fast unaligned accesses.
+
+; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4.2       2>&1 | FileCheck %s --check-prefix=FAST
+; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4a        2>&1 | FileCheck %s --check-prefix=FAST
+
 define void @store_zeros(i8* %a) {
 ; SLOW-NOT: not a recognized processor
 ; SLOW-LABEL: store_zeros: