[llvm-commits] [llvm] r155866 - in /llvm/trunk: lib/Transforms/InstCombine/InstCombineCalls.cpp test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll

Lang Hames lhames at gmail.com
Tue May 1 11:05:24 PDT 2012


Hi Duncan,

Looks like Eli and Chandler beat me to it. Short answer: if we generated an
ext + mul, the ext could be hoisted, preventing our BB-at-a-time isel from
forming the neon instr. This patch only expands where it's safe.

Cheers,
Lang.

On Tue, May 1, 2012 at 1:28 AM, Chandler Carruth <chandlerc at google.com>wrote:

> On Tue, May 1, 2012 at 1:07 AM, Duncan Sands <baldrick at free.fr> wrote:
>
>> Hi Lang,
>>
>> On 01/05/12 02:20, Lang Hames wrote:
>> > Author: lhames
>> > Date: Mon Apr 30 19:20:38 2012
>> > New Revision: 155866
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=155866&view=rev
>> > Log:
>> > Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. Fixes
>> > <rdar://problem/11291436>.
>> >
>> > This is a second attempt at a fix for this, the first was r155468.
>> Thanks
>> > to Chandler, Bob and others for the feedback that helped me improve
>> this.
>>
>> are these intrinsics actually useful?  Couldn't you get the same effect
>> using
>> generic IR, eg: first extend <4 x i16> to <4 x i32> then do the
>> multiplication
>> in <4 x i32>; and have the ARM code generators recognize this pattern and
>> create
>> the appropriate vmull instruction?
>>
>
> See the lengthy thread about this.
>
> Short summary: it *should* work, and it's theoretically cleaner, but it
> doesn't today. Maybe in the future it will, and we can switch.
>
> The problem is that the DAG and isel machinery runs BB-at-a-time, and the
> *ext instructions can get hoisted out of a basic block. Once that happens,
> the code generator can't match this back down to the widening instructions.
>
> The end-game solution is whole-function isel. We get that, this problem
> vanishes.
>
> I suggested one compromise of adding target-specific hooks to the DAG
> builder so that it could look across the BBs of the IR, and directly form
> the target-specific DAG node for the widening mul, but it doesn't seem
> likely to be worth doing just for this intrinsic....
>
>
>>
>> Ciao, Duncan.
>>
>> >
>> > Added:
>> >
>>  llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll
>> > Modified:
>> >      llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>> >
>> > Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>> > URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp?rev=155866&r1=155865&r2=155866&view=diff
>> >
>> ==============================================================================
>> > --- llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>> (original)
>> > +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp Mon Apr
>> 30 19:20:38 2012
>> > @@ -694,6 +694,57 @@
>> >       break;
>> >     }
>> >
>> > +  case Intrinsic::arm_neon_vmulls:
>> > +  case Intrinsic::arm_neon_vmullu: {
>> > +    Value *Arg0 = II->getArgOperand(0);
>> > +    Value *Arg1 = II->getArgOperand(1);
>> > +
>> > +    // Handle mul by zero first:
>> > +    if (isa<ConstantAggregateZero>(Arg0) ||
>> isa<ConstantAggregateZero>(Arg1)) {
>> > +      return ReplaceInstUsesWith(CI,
>> ConstantAggregateZero::get(II->getType()));
>> > +    }
>> > +
>> > +    // Check for constant LHS&  RHS - in this case we just simplify.
>> > +    bool Zext = (II->getIntrinsicID() == Intrinsic::arm_neon_vmullu);
>> > +    VectorType *NewVT = cast<VectorType>(II->getType());
>> > +    unsigned NewWidth = NewVT->getElementType()->getIntegerBitWidth();
>> > +    if (ConstantDataVector *CV0 = dyn_cast<ConstantDataVector>(Arg0)) {
>> > +      if (ConstantDataVector *CV1 =
>> dyn_cast<ConstantDataVector>(Arg1)) {
>> > +        VectorType* VT = cast<VectorType>(CV0->getType());
>> > +        SmallVector<Constant*, 4>  NewElems;
>> > +        for (unsigned i = 0; i<  VT->getNumElements(); ++i) {
>> > +          APInt CV0E =
>> > +
>>  (cast<ConstantInt>(CV0->getAggregateElement(i)))->getValue();
>> > +          CV0E = Zext ? CV0E.zext(NewWidth) : CV0E.sext(NewWidth);
>> > +          APInt CV1E =
>> > +
>>  (cast<ConstantInt>(CV1->getAggregateElement(i)))->getValue();
>> > +          CV1E = Zext ? CV1E.zext(NewWidth) : CV1E.sext(NewWidth);
>> > +          NewElems.push_back(
>> > +            ConstantInt::get(NewVT->getElementType(), CV0E * CV1E));
>> > +        }
>> > +        return ReplaceInstUsesWith(CI, ConstantVector::get(NewElems));
>> > +      }
>> > +
>> > +      // Couldn't simplify - cannonicalize constant to the RHS.
>> > +      std::swap(Arg0, Arg1);
>> > +    }
>> > +
>> > +    // Handle mul by one:
>> > +    if (ConstantDataVector *CV1 = dyn_cast<ConstantDataVector>(Arg1)) {
>> > +      if (ConstantInt *Splat =
>> > +            dyn_cast_or_null<ConstantInt>(CV1->getSplatValue())) {
>> > +        if (Splat->isOne()) {
>> > +          if (Zext)
>> > +            return CastInst::CreateZExtOrBitCast(Arg0, II->getType());
>> > +          // else
>> > +          return CastInst::CreateSExtOrBitCast(Arg0, II->getType());
>> > +        }
>> > +      }
>> > +    }
>> > +
>> > +    break;
>> > +  }
>> > +
>> >     case Intrinsic::stackrestore: {
>> >       // If the save is right next to the restore, remove the restore.
>>  This can
>> >       // happen when variable allocas are DCE'd.
>> >
>> > Added:
>> llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll
>> > URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll?rev=155866&view=auto
>> >
>> ==============================================================================
>> > ---
>> llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll (added)
>> > +++
>> llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll Mon
>> Apr 30 19:20:38 2012
>> > @@ -0,0 +1,68 @@
>> > +target datalayout =
>> "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
>> > +target triple = "thumbv7-apple-ios0"
>> > +
>> > +; RUN: opt -S -instcombine<  %s | FileCheck %s
>> > +
>> > +define<4 x i32>  @mulByZero(<4 x i16>  %x) nounwind readnone ssp {
>> > +entry:
>> > +  %a = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>
>>  %x,<4 x i16>  zeroinitializer) nounwind
>> > +  ret<4 x i32>  %a
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: ret<4 x i32>  zeroinitializer
>> > +}
>> > +
>> > +define<4 x i32>  @mulByOne(<4 x i16>  %x) nounwind readnone ssp {
>> > +entry:
>> > +  %a = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>
>>  %x,<4 x i16>  <i16 1, i16 1, i16 1, i16 1>) nounwind
>> > +  ret<4 x i32>  %a
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: %a = sext<4 x i16>  %x to<4 x i32>
>> > +; CHECK-NEXT: ret<4 x i32>  %a
>> > +}
>> > +
>> > +define<4 x i32>  @constantMul() nounwind readnone ssp {
>> > +entry:
>> > +  %a = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>  <i16
>> 3, i16 3, i16 3, i16 3>,<4 x i16>  <i16 2, i16 2, i16 2, i16 2>) nounwind
>> > +  ret<4 x i32>  %a
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: ret<4 x i32>  <i32 6, i32 6, i32 6, i32 6>
>> > +}
>> > +
>> > +define<4 x i32>  @constantMulS() nounwind readnone ssp {
>> > +entry:
>> > +  %b = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>  <i16
>> -1, i16 -1, i16 -1, i16 -1>,<4 x i16>  <i16 1, i16 1, i16 1, i16 1>)
>> nounwind
>> > +  ret<4 x i32>  %b
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: ret<4 x i32>  <i32 -1, i32 -1, i32 -1, i32 -1>
>> > +}
>> > +
>> > +define<4 x i32>  @constantMulU() nounwind readnone ssp {
>> > +entry:
>> > +  %b = tail call<4 x i32>  @llvm.arm.neon.vmullu.v4i32(<4 x i16>  <i16
>> -1, i16 -1, i16 -1, i16 -1>,<4 x i16>  <i16 1, i16 1, i16 1, i16 1>)
>> nounwind
>> > +  ret<4 x i32>  %b
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: ret<4 x i32>  <i32 65535, i32 65535, i32 65535, i32
>> 65535>
>> > +}
>> > +
>> > +define<4 x i32>  @complex1(<4 x i16>  %x) nounwind readnone ssp {
>> > +entry:
>> > +  %a = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>  <i16
>> 2, i16 2, i16 2, i16 2>,<4 x i16>  %x) nounwind
>> > +  %b = add<4 x i32>  zeroinitializer, %a
>> > +  ret<4 x i32>  %b
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: %a = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4
>> x i16>  <i16 2, i16 2, i16 2, i16 2>,<4 x i16>  %x) nounwind
>> > +; CHECK-NEXT: ret<4 x i32>  %a
>> > +}
>> > +
>> > +define<4 x i32>  @complex2(<4 x i32>  %x) nounwind readnone ssp {
>> > +entry:
>> > +  %a = tail call<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>  <i16
>> 3, i16 3, i16 3, i16 3>,<4 x i16>  <i16 2, i16 2, i16 2, i16 2>) nounwind
>> > +  %b = add<4 x i32>  %x, %a
>> > +  ret<4 x i32>  %b
>> > +; CHECK: entry:
>> > +; CHECK-NEXT: %b = add<4 x i32>  %x,<i32 6, i32 6, i32 6, i32 6>
>> > +; CHECK-NEXT: ret<4 x i32>  %b
>> > +}
>> > +
>> > +declare<4 x i32>  @llvm.arm.neon.vmulls.v4i32(<4 x i16>,<4 x i16>)
>> nounwind readnone
>> > +declare<4 x i32>  @llvm.arm.neon.vmullu.v4i32(<4 x i16>,<4 x i16>)
>> nounwind readnone
>> >
>> >
>> > _______________________________________________
>> > llvm-commits mailing list
>> > llvm-commits at cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120501/07a59e29/attachment.html>


More information about the llvm-commits mailing list