[PATCH] ARM: do not emit lsrh/ashr for NEON shifts
Amaury de la Vieuville
amaury.delavieuville at arm.com
Thu Oct 3 08:43:52 PDT 2013
Hi Tim, comments inline.
================
Comment at: lib/CodeGen/CGBuiltin.cpp:3089-3092
@@ -3077,7 +3088,6 @@
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vshll", 1);
case ARM::BI__builtin_neon_vshl_n_v:
case ARM::BI__builtin_neon_vshlq_n_v:
- Ops[1] = EmitNeonShiftVector(Ops[1], Ty, false);
- return Builder.CreateShl(Builder.CreateBitCast(Ops[0],Ty), Ops[1],
- "vshl_n");
+ Int = usgn ? Intrinsic::arm_neon_vshiftu : Intrinsic::arm_neon_vshifts;
+ return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vshl_n", 1, false);
case ARM::BI__builtin_neon_vshrn_n_v:
----------------
Tim Northover wrote:
> All the documentation I can find says that vshl_n intrinsics take an integer from [0,B-1], which is in range for shift.
You're right, but for consistency, I eliminated all shifts instruction on vectors. I can change that though.
================
Comment at: lib/CodeGen/CGBuiltin.cpp:3096-3099
@@ -3085,10 +3095,6 @@
Ops, "vshrn_n", 1, true);
case ARM::BI__builtin_neon_vshr_n_v:
case ARM::BI__builtin_neon_vshrq_n_v:
- Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
- Ops[1] = EmitNeonShiftVector(Ops[1], Ty, false);
- if (usgn)
- return Builder.CreateLShr(Ops[0], Ops[1], "vshr_n");
- else
- return Builder.CreateAShr(Ops[0], Ops[1], "vshr_n");
+ Int = usgn ? Intrinsic::arm_neon_vshiftu : Intrinsic::arm_neon_vshifts;
+ return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vshr_n", 1, true);
case ARM::BI__builtin_neon_vsri_n_v:
----------------
Tim Northover wrote:
> The right shifts are more problematic. But the intrinsics again seem to be limited to [1, B] (and the value is known at compile-time).
>
> Couldn't we detect the maximum value and emit 0 for an unsigned shift and "icmp lt %val, 0" for a signed shift?
That's something I considered, but then how can we deal with VSRA?
With this solution, we can match `add (vshr ...)` for VSRA. It seems awkward to add patterns to match the icmp/0 form, and I suppose we still want a VSRA instruction to be emitted instead of a comparison and a vadd, don't we?
Of course we can always add an intrinsic for VSRA/VRSRA (which was already emitting vshift instead of {a,l}shr btw).
http://llvm-reviews.chandlerc.com/D1819
More information about the cfe-commits
mailing list