[PATCH] D12288: make fast unaligned memory accesses implicit with SSE4.2 or SSE4a

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 25 09:30:41 PDT 2015


This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.
Closed by commit rL245950: make fast unaligned memory accesses implicit with SSE4.2 or SSE4a (authored by spatel).

Changed prior to commit:
  http://reviews.llvm.org/D12288?vs=32957&id=33084#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D12288

Files:
  llvm/trunk/lib/Target/X86/X86Subtarget.cpp
  llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll

Index: llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll
===================================================================
--- llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll
+++ llvm/trunk/test/CodeGen/X86/slow-unaligned-mem.ll
@@ -55,6 +55,11 @@
 ; Slow chips use 4-byte stores. Fast chips with SSE or later use something other than 4-byte stores.
 ; Chips that don't have SSE use 4-byte stores either way, so they're not tested.
 
+; Also verify that SSE4.2 or SSE4a imply fast unaligned accesses.
+
+; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4.2       2>&1 | FileCheck %s --check-prefix=FAST
+; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=sse4a        2>&1 | FileCheck %s --check-prefix=FAST
+
 define void @store_zeros(i8* %a) {
 ; SLOW-NOT: not a recognized processor
 ; SLOW-LABEL: store_zeros:
Index: llvm/trunk/lib/Target/X86/X86Subtarget.cpp
===================================================================
--- llvm/trunk/lib/Target/X86/X86Subtarget.cpp
+++ llvm/trunk/lib/Target/X86/X86Subtarget.cpp
@@ -192,6 +192,13 @@
   // Parse features string and set the CPU.
   ParseSubtargetFeatures(CPUName, FullFS);
 
+  // All CPUs that implement SSE4.2 or SSE4A support unaligned accesses of
+  // 16-bytes and under that are reasonably fast. These features were
+  // introduced with Intel's Nehalem/Silvermont and AMD's Family10h
+  // micro-architectures respectively.
+  if (hasSSE42() || hasSSE4A())
+    IsUAMemUnder32Slow = false;
+  
   InstrItins = getInstrItineraryForCPU(CPUName);
 
   // It's important to keep the MCSubtargetInfo feature bits in sync with


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D12288.33084.patch
Type: text/x-patch
Size: 1604 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150825/ec023483/attachment.bin>


More information about the llvm-commits mailing list