[llvm] r218734 - [x86] Teach the new vector shuffle lowering to be even more aggressive

Tue Sep 30 20:19:44 PDT 2014

Author: chandlerc
Date: Tue Sep 30 22:19:43 2014
New Revision: 218734

URL: http://llvm.org/viewvc/llvm-project?rev=218734&view=rev
Log:
[x86] Teach the new vector shuffle lowering to be even more aggressive
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.

This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.

Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.

Modified:
    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
    llvm/trunk/test/CodeGen/X86/2012-07-15-broadcastfold.ll

Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=218734&r1=218733&r2=218734&view=diff
==============================================================================

--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Sep 30 22:19:43 2014
@@ -7850,27 +7850,21 @@ static SDValue lowerVectorShuffleAsBroad
                                             "a sorted mask where the broadcast "
                                             "comes from V1.");
 
-  // Check if this is a broadcast of a scalar load -- those are more widely
-  // supported than broadcasting in-register values.
+  // Check if this is a broadcast of a scalar. We special case lowering for
+  // scalars so that we can more effectively fold with loads.
   if (V.getOpcode() == ISD::BUILD_VECTOR ||
         (V.getOpcode() == ISD::SCALAR_TO_VECTOR && BroadcastIdx == 0)) {
-    SDValue BroadcastV = V.getOperand(BroadcastIdx);
-    if (ISD::isNON_EXTLoad(BroadcastV.getNode())) {
-      // We can directly broadcast from memory.
-      return DAG.getNode(X86ISD::VBROADCAST, DL, VT, BroadcastV);
-    }
-  }
-
-  // We can't broadcast from a register w/o AVX2.
-  if (!Subtarget->hasAVX2())
-    return SDValue();
+    V = V.getOperand(BroadcastIdx);
 
-  // Check if this is a broadcast of a BUILD_VECTOR which we can always handle,
-  // or is a broadcast of the zero element.
-  if (V.getOpcode() == ISD::BUILD_VECTOR)
-    V = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VT, V.getOperand(BroadcastIdx));
-  else if (BroadcastIdx != 0)
+    // If the scalar isn't a load we can't broadcast from it in AVX1, only with
+    // AVX2.
+    if (!Subtarget->hasAVX2() && !ISD::isNON_EXTLoad(V.getNode()))
+      return SDValue();
+  } else if (BroadcastIdx != 0 || !Subtarget->hasAVX2()) {
+    // We can't broadcast from a vector register w/o AVX2, and we can only
+    // broadcast from the zero-element of a vector register.
     return SDValue();
+  }
 
   return DAG.getNode(X86ISD::VBROADCAST, DL, VT, V);
 }

Modified: llvm/trunk/test/CodeGen/X86/2012-07-15-broadcastfold.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2012-07-15-broadcastfold.ll?rev=218734&r1=218733&r2=218734&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/2012-07-15-broadcastfold.ll (original)
+++ llvm/trunk/test/CodeGen/X86/2012-07-15-broadcastfold.ll Tue Sep 30 22:19:43 2014
@@ -1,4 +1,5 @@
 ; RUN: llc < %s -march=x86 -mcpu=corei7 -mattr=+avx2 | FileCheck %s
+; RUN: llc < %s -march=x86 -mcpu=corei7 -mattr=+avx2 -x86-experimental-vector-shuffle-lowering | FileCheck %s
 
 declare x86_fastcallcc i64 @barrier()