[PATCH] D35192: [ARM] Use ADDCARRY / SUBCARRY

Roger Ferrer Ibanez via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 14 09:46:08 PST 2017


rogfer01 added a comment.

Hi @efriedma ,

I'm trying to understand what went wrong in the following input

  define i32 @foo(i32 %var1, i32 %var2, i32 %var3, i32 %var4, i32 %var5) local_unnamed_addr #0 {
  entry:
    %conv1 = zext i32 %var3 to i64
    %conv2 = zext i32 %var1 to i64
    %add3 = add nuw nsw i64 %conv1, %conv2
    %shr = lshr i64 %add3, 32
    %conv5 = trunc i64 %shr to i32
    %conv7 = and i64 %add3, 4294967295
    %add10 = add nuw nsw i64 %conv7, %conv2
    %shr12 = lshr i64 %add10, 32
    %conv13 = trunc i64 %shr12 to i32
    %add14 = add nuw nsw i32 %conv13, %conv5
    %conv16 = zext i32 %var4 to i64
    %conv18 = zext i32 %var2 to i64
    %add19 = add nuw nsw i64 %conv16, %conv18
    %shr21 = lshr i64 %add19, 32
    %conv24 = zext i32 %var5 to i64
    %add25 = add nuw nsw i64 %shr21, %conv24
    %shr28 = lshr i64 %add25, 32
    %conv29 = trunc i64 %shr28 to i32
    %add30 = add nuw nsw i32 %add14, %conv29
    ret i32 %add30
  }

which (with ADDCARRY / SUBCARRY nodes enabled) generates the following selection using

  ./bin/llc -mtriple armv8 -O3 -o - bad.ll



  SelectionDAG has 28 nodes:
    t0: ch = EntryToken
    t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
      t6: i32,ch = CopyFromReg t0, Register:i32 %vreg2
    t75: i32,i32 = ADDSrr t6, t2, TargetConstant:i32<14>, Register:i32 %noreg
            t58: i32 = MOVi TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 %noreg, Register:i32 %noreg
              t69: i32,i32 = ADDSrr t75, t2, TargetConstant:i32<14>, Register:i32 %noreg
            t84: ch,glue = CopyToReg t0, Register:i32 %CPSR, t69:1
          t70: i32,i32 = ADCri t58, TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 %noreg, Register:i32 %noreg, t84:1
          t83: ch,glue = CopyToReg t0, Register:i32 %CPSR, t75:1
        t66: i32,i32 = ADCri t70, TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 %noreg, Register:i32 %noreg, t83:1
        t47: i32,ch = LDRi12<Mem:LD4[FixedStack-1](align=8)> TargetFrameIndex:i32<-1>, TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 %noreg, t0
            t8: i32,ch = CopyFromReg t0, Register:i32 %vreg3
            t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
          t72: i32,i32 = ADDSrr t8, t4, TargetConstant:i32<14>, Register:i32 %noreg
        t81: ch,glue = CopyToReg t0, Register:i32 %CPSR, t72:1
      t62: i32,i32 = ADCrr t66, t47, TargetConstant:i32<14>, Register:i32 %noreg, Register:i32 %noreg, t81:1
    t35: ch,glue = CopyToReg t0, Register:i32 %R0, t62
    t36: ch = BX_RET TargetConstant:i32<14>, Register:i32 %noreg, Register:i32 %R0, t35, t35:1

Where `t75` second value is consumed by `t83` CopyToReg which is glued to `t66` ADCri. Similarly `t69` second value is consumed by `t84` CopytoReg which is glued to `t70`. I think this is not constraining enough the schedule which decides to do

  BB#0: derived from LLVM BB %entry
      Live Ins: %R0 %R1 %R2 %R3
  	%vreg3<def> = COPY %R3; GPR:%vreg3
  	%vreg2<def> = COPY %R2; GPR:%vreg2
  	%vreg1<def> = COPY %R1; GPR:%vreg1
  	%vreg0<def> = COPY %R0; GPR:%vreg0
  	%vreg4<def> = MOVi 0, pred:14, pred:%noreg, opt:%noreg; GPR:%vreg4
  	%vreg5<def> = ADDrr %vreg2, %vreg0, pred:14, pred:%noreg, opt:%CPSR<def>; GPR:%vreg5,%vreg2,%vreg0
  	%vreg6<def> = ADDrr %vreg5<kill>, %vreg0, pred:14, pred:%noreg, opt:%CPSR<def>; GPR:%vreg6,%vreg5,%vreg0
  	%vreg7<def> = ADCri %vreg4<kill>, 0, pred:14, pred:%noreg, opt:%noreg, %CPSR<imp-use>; GPR:%vreg7,%vreg4
    ...

This is, the CPSR computed in `%vreg5` is clobbered by the CPSR computed in `%vreg6`.

Looks like (after we have lowered addcarry/subcarry into `ARMISD::ADDC`/`ARMISD::ADDE` and removed the redundant nodes to make sure the carry flag is correctly propagated)  we have ended with something that cannot be scheduled sensibly without storing the carry flag somewhere. Given that it is found inside CPSR this will mean MRS / MSR which will move more bits than the ones I need from/to the flags.

Before addcarry/subcarry, the code can be scheduled without having to preserve the flags anywhere so I must be missing something very obvious here. Perhaps the combiners we do in the `ARMISelLowering` have to be more conservative?

Kind regards,


Repository:
  rL LLVM

https://reviews.llvm.org/D35192





More information about the llvm-commits mailing list