[PATCH] D39848: [RISCV] Support lowering FrameIndex

Wed Nov 22 12:56:17 PST 2017

asb added a comment.

I've had a good look at the alternative approach, whereby a ComplexPattern is used that _just_ matches the FrameIndex and extra load and store tablegen patterns are written using that. This also requires lowering FrameIndex in RISCVDAGToDAGISel::Select. If going that way, it probably makes sense to move HexagonDAGToDAGISel::isOrEquivalentToAdd to target-independent code rather than copying it in to RISCVDagToDAGISel.

I've managed to achieve identical codegen apart from one regression, where a simple varargs test case starts with:

  t0: ch = EntryToken
        t39: i32 = add nuw FrameIndex:i32<-1>, Constant:i32<4>
      t55: ch = store<ST4[%va]> t0, t39, FrameIndex:i32<0>, undef:i32
        t9: i32,ch = CopyFromReg t0, Register:i32 %vreg2
      t11: ch = store<ST4[FixedStack-4](align=8)> t0, t9, FrameIndex:i32<-4>, undef:i32
        t13: i32,ch = CopyFromReg t0, Register:i32 %vreg3
      t15: ch = store<ST4[<unknown>]> t0, t13, FrameIndex:i32<-5>, undef:i32
        t17: i32,ch = CopyFromReg t0, Register:i32 %vreg4
      t19: ch = store<ST4[FixedStack-6](align=16)> t0, t17, FrameIndex:i32<-6>, undef:i32
        t21: i32,ch = CopyFromReg t0, Register:i32 %vreg5
      t23: ch = store<ST4[<unknown>]> t0, t21, FrameIndex:i32<-7>, undef:i32
        t25: i32,ch = CopyFromReg t0, Register:i32 %vreg6
      t27: ch = store<ST4[FixedStack-8](align=8)> t0, t25, FrameIndex:i32<-8>, undef:i32
        t29: i32,ch = CopyFromReg t0, Register:i32 %vreg7
      t31: ch = store<ST4[<unknown>]> t0, t29, FrameIndex:i32<-9>, undef:i32
    t60: ch = TokenFactor t58:1, t55, t11, t15, t19, t23, t27, t31
  t44: ch,glue = CopyToReg t60, Register:i32 %X10, t58
      t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
    t7: ch = store<ST4[<unknown>]> t0, t4, FrameIndex:i32<-3>, undef:i32
  t58: i32,ch = load<LD4[%2]> t7, FrameIndex:i32<-1>, undef:i32
  t45: ch = RISCVISD::RET_FLAG t44, Register:i32 %X10, t44:1

ends up as the following (two addi when one would suffice):

  	%vreg7<def> = COPY %X17; GPR:%vreg7
  	%vreg6<def> = COPY %X16; GPR:%vreg6
  	%vreg5<def> = COPY %X15; GPR:%vreg5
  	%vreg4<def> = COPY %X14; GPR:%vreg4
  	%vreg3<def> = COPY %X13; GPR:%vreg3
  	%vreg2<def> = COPY %X12; GPR:%vreg2
  	%vreg1<def> = COPY %X11; GPR:%vreg1
  	SW %vreg1, <fi#-3>, 0; mem:ST4[<unknown>] GPR:%vreg1
  	SW %vreg7, <fi#-9>, 0; mem:ST4[<unknown>] GPR:%vreg7
  	SW %vreg6, <fi#-8>, 0; mem:ST4[FixedStack-8](align=8) GPR:%vreg6
  	SW %vreg5, <fi#-7>, 0; mem:ST4[<unknown>] GPR:%vreg5
  	SW %vreg4, <fi#-6>, 0; mem:ST4[FixedStack-6](align=16) GPR:%vreg4
  	SW %vreg3, <fi#-5>, 0; mem:ST4[<unknown>] GPR:%vreg3
  	SW %vreg2, <fi#-4>, 0; mem:ST4[FixedStack-4](align=8) GPR:%vreg2
  	%vreg8<def> = ADDI <fi#-1>, 0; GPR:%vreg8
  	%vreg9<def> = ADDI %vreg8<kill>, 4; GPR:%vreg9,%vreg8
  	SW %vreg9<kill>, <fi#0>, 0; mem:ST4[%va] GPR:%vreg9
  	%vreg10<def> = LW <fi#-1>, 0; mem:LD4[%2] GPR:%vreg10
  	%X10<def> = COPY %vreg10; GPR:%vreg10
  	PseudoRET %X10<imp-use>

One the one hand, the RISC-V backend is starting by focusing on correct and well tested output, with performance work coming later. This means sub-optimal codegen can be tolerated initially. On the other hand, it seems a shame to regress code generation in this way.

I'll have a look if there's a solution that doesn't result in too much additional complexity.

https://reviews.llvm.org/D39848