[llvm-commits] [llvm] r155468 - in /llvm/trunk: lib/Transforms/InstCombine/InstCombineCalls.cpp test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll

Chandler Carruth chandlerc at google.com
Tue Apr 24 13:04:14 PDT 2012


On Tue, Apr 24, 2012 at 11:58 AM, Lang Hames <lhames at gmail.com> wrote:

> Author: lhames
> Date: Tue Apr 24 13:58:36 2012
> New Revision: 155468
>
> URL: http://llvm.org/viewvc/llvm-project?rev=155468&view=rev
> Log:
> Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. This fixes
> <rdar://problem/11291436>.
>

Sorry I hadn't yet replied, but I really don't think this is the right
fix...

+  case Intrinsic::arm_neon_vmulls:
> +  case Intrinsic::arm_neon_vmullu: {
> +    // Zext/sext intrinsic operands according to the intrinsic type, then
> try to
> +    // simplify them. This lets us try a SimplifyMulInst on the extended
> +    // operands. If the zext/sext instructions are unused when we're done
> then
> +    // delete them from the block.
> +    Value* Arg0 = II->getArgOperand(0);
> +    Value* Arg1 = II->getArgOperand(1);
> +    bool Zext = (II->getIntrinsicID() == Intrinsic::arm_neon_vmullu);
> +    Instruction *Arg0W =
> +      Zext ? CastInst::CreateZExtOrBitCast(Arg0, II->getType(), "", II) :
> +             CastInst::CreateSExtOrBitCast(Arg0, II->getType(), "", II);
> +    Value* Arg0WS = SimplifyInstruction(Arg0W);
> +    if (Arg0WS == 0) // If simplification fails just pass through the
> ext'd val.
> +      Arg0WS = Arg0W;
>

Why are you simplifying these incrementally? SimplifyInstruction is not at
all free, it can recurse deeply and add compile time with little benefit.

I would just check if they are the right size, and if not, extend them.
We'll simplify everything if there are good ways to do so after-the-fact.

+    Instruction *Arg1W =
> +      Zext ? CastInst::CreateZExtOrBitCast(Arg1, II->getType(), "", II) :
> +             CastInst::CreateSExtOrBitCast(Arg1, II->getType(), "", II);
> +    Value* Arg1WS = SimplifyInstruction(Arg1W);
> +    if (Arg1WS == 0)
> +      Arg1WS = Arg1W;
>

Ditto.


> +    Instruction *SimplifiedInst = 0;
> +    if (Value* V = SimplifyMulInst(Arg0WS, Arg1WS, TD)) {
>

I think this is an unnecessary gymnastic just to re-use the InstSimplify
logic.

InstSimplify's constraints (as you pointed out in the review thread) are
that it cannot form new instructions. That doesn't mean that the right
approach in InstCombine is to form additional instructions that we think
might be "cheap", and then see if InstSimplify can simplify through them.
Is this a pattern already established elsewhere in InstCombine? I haven't
seen it, but there is a lot of InstCombine I've not looked at...

Instead, I'd rather write the actual analysis your interested in performing
on these intrinsics in InstCombine, with the knowledge (and taking the
action) of extending as necessary.

The specific advantages I see are:

- Don't thrash the use lists by adding no-op instructions
- Don't allocate, insert, use, then delete instructions in a common case
(no simplification).
- Able to extend the analysis in other ways that require forming new
instructions.

Is there some particularly complex logic in SimplifyMulInst that you're
trying to re-use? Maybe there is a way to factor that into something shared
by both InstCombine and InstSimplify?

+      SimplifiedInst = ReplaceInstUsesWith(CI, V);
> +    }
> +    if (Arg0W->use_empty())
> +      Arg0W->eraseFromParent();
> +    if (Arg1W->use_empty())
> +      Arg1W->eraseFromParent();
> +    if (SimplifiedInst != 0)
> +      return SimplifiedInst;
> +    break;
> +  }
> +
>   case Intrinsic::stackrestore: {
>     // If the save is right next to the restore, remove the restore.  This
> can
>     // happen when variable allocas are DCE'd.
>
> Added: llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll?rev=155468&view=auto
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll
> (added)
> +++ llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll
> Tue Apr 24 13:58:36 2012
> @@ -0,0 +1,68 @@
> +target datalayout =
> "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
> +target triple = "thumbv7-apple-ios0"
> +
> +; RUN: opt -S -instcombine < %s | FileCheck %s
> +
> +define <4 x i32> @mulByZero(<4 x i16> %x) nounwind readnone ssp {
> +entry:
> +  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %x, <4 x
> i16> zeroinitializer) nounwind
> +  ret <4 x i32> %a
> +; CHECK: entry:
> +; CHECK-NEXT: ret <4 x i32> zeroinitializer
> +}
> +
> +define <4 x i32> @mulByOne(<4 x i16> %x) nounwind readnone ssp {
> +entry:
> +  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %x, <4 x
> i16> <i16 1, i16 1, i16 1, i16 1>) nounwind
> +  ret <4 x i32> %a
> +; CHECK: entry:
> +; CHECK-NEXT: %0 = sext <4 x i16> %x to <4 x i32>
> +; CHECK-NEXT: ret <4 x i32> %0
> +}
> +
> +define <4 x i32> @constantMul() nounwind readnone ssp {
> +entry:
> +  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 3,
> i16 3, i16 3, i16 3>, <4 x i16> <i16 2, i16 2, i16 2, i16 2>) nounwind
> +  ret <4 x i32> %a
> +; CHECK: entry:
> +; CHECK-NEXT: ret <4 x i32> <i32 6, i32 6, i32 6, i32 6>
> +}
> +
> +define <4 x i32> @constantMulS() nounwind readnone ssp {
> +entry:
> +  %b = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 -1,
> i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1>) nounwind
> +  ret <4 x i32> %b
> +; CHECK: entry:
> +; CHECK-NEXT: ret <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
> +}
> +
> +define <4 x i32> @constantMulU() nounwind readnone ssp {
> +entry:
> +  %b = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> <i16 -1,
> i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1>) nounwind
> +  ret <4 x i32> %b
> +; CHECK: entry:
> +; CHECK-NEXT: ret <4 x i32> <i32 65535, i32 65535, i32 65535, i32 65535>
> +}
> +
> +define <4 x i32> @complex1(<4 x i16> %x) nounwind readnone ssp {
> +entry:
> +  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 2,
> i16 2, i16 2, i16 2>, <4 x i16> %x) nounwind
> +  %b = add <4 x i32> zeroinitializer, %a
> +  ret <4 x i32> %b
> +; CHECK: entry:
> +; CHECK-NEXT: %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x
> i16> <i16 2, i16 2, i16 2, i16 2>, <4 x i16> %x) nounwind
> +; CHECK-NEXT: ret <4 x i32> %a
> +}
> +
> +define <4 x i32> @complex2(<4 x i32> %x) nounwind readnone ssp {
> +entry:
> +  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 3,
> i16 3, i16 3, i16 3>, <4 x i16> <i16 2, i16 2, i16 2, i16 2>) nounwind
> +  %b = add <4 x i32> %x, %a
> +  ret <4 x i32> %b
> +; CHECK: entry:
> +; CHECK-NEXT: %b = add <4 x i32> %x, <i32 6, i32 6, i32 6, i32 6>
> +; CHECK-NEXT: ret <4 x i32> %b
> +}
> +
> +declare <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16>, <4 x i16>)
> nounwind readnone
> +declare <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16>, <4 x i16>)
> nounwind readnone
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120424/d03d4d92/attachment.html>


More information about the llvm-commits mailing list