[PATCH] ARM: do not emit lsrh/ashr for NEON shifts

Thu Oct 3 08:43:52 PDT 2013

  Hi Tim, comments inline.


================
Comment at: lib/CodeGen/CGBuiltin.cpp:3089-3092
@@ -3077,7 +3088,6 @@
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vshll", 1);
   case ARM::BI__builtin_neon_vshl_n_v:
   case ARM::BI__builtin_neon_vshlq_n_v:
-    Ops[1] = EmitNeonShiftVector(Ops[1], Ty, false);
-    return Builder.CreateShl(Builder.CreateBitCast(Ops[0],Ty), Ops[1],
-                             "vshl_n");
+    Int = usgn ? Intrinsic::arm_neon_vshiftu : Intrinsic::arm_neon_vshifts;
+    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vshl_n", 1, false);
   case ARM::BI__builtin_neon_vshrn_n_v:
----------------
Tim Northover wrote:
> All the documentation I can find says that vshl_n intrinsics take an integer from [0,B-1], which is in range for shift.
You're right, but for consistency, I eliminated all shifts instruction on vectors. I can change that though.

================
Comment at: lib/CodeGen/CGBuiltin.cpp:3096-3099
@@ -3085,10 +3095,6 @@
                         Ops, "vshrn_n", 1, true);
   case ARM::BI__builtin_neon_vshr_n_v:
   case ARM::BI__builtin_neon_vshrq_n_v:
-    Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
-    Ops[1] = EmitNeonShiftVector(Ops[1], Ty, false);
-    if (usgn)
-      return Builder.CreateLShr(Ops[0], Ops[1], "vshr_n");
-    else
-      return Builder.CreateAShr(Ops[0], Ops[1], "vshr_n");
+    Int = usgn ? Intrinsic::arm_neon_vshiftu : Intrinsic::arm_neon_vshifts;
+    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vshr_n", 1, true);
   case ARM::BI__builtin_neon_vsri_n_v:
----------------
Tim Northover wrote:
> The right shifts are more problematic. But the intrinsics again seem to be limited to [1, B] (and the value is known at compile-time).
> 
> Couldn't we detect the maximum value and emit 0 for an unsigned shift and "icmp lt %val, 0" for a signed shift?
That's something I considered, but then how can we deal with VSRA?
With this solution, we can match `add (vshr ...)` for VSRA. It seems awkward to add patterns to match the icmp/0 form, and I suppose we still want a VSRA instruction to be emitted instead of a comparison and a vadd, don't we?
Of course we can always add an intrinsic for VSRA/VRSRA (which was already emitting vshift instead of {a,l}shr btw).


http://llvm-reviews.chandlerc.com/D1819