[PATCH] D157373: [RISCV] add a compress optimization for stack inst.

Sun Aug 13 19:41:03 PDT 2023

lcvon007 marked 5 inline comments as done.
lcvon007 added a comment.

In D157373#4579136 <https://reviews.llvm.org/D157373#4579136>, @wangpc wrote:

> I just found a regression:
>
>   --- a/llvm/test/CodeGen/RISCV/stack-realignment.ll
>   +++ b/llvm/test/CodeGen/RISCV/stack-realignment.ll
>   @@ -1,7 +1,7 @@
>    ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
>   -; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
>   +; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
>    ; RUN:   | FileCheck %s -check-prefix=RV32I
>   -; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
>   +; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
>    ; RUN:   | FileCheck %s -check-prefix=RV64I
>    
>    declare void @callee(ptr)
>   @@ -529,56 +529,58 @@ define void @caller_no_realign2048() "no-realign-stack" {
>    define void @caller4096() {
>    ; RV32I-LABEL: caller4096:
>    ; RV32I:       # %bb.0:
>   -; RV32I-NEXT:    addi sp, sp, -2032
>   -; RV32I-NEXT:    .cfi_def_cfa_offset 2032
>   -; RV32I-NEXT:    sw ra, 2028(sp) # 4-byte Folded Spill
>   -; RV32I-NEXT:    sw s0, 2024(sp) # 4-byte Folded Spill
>   +; RV32I-NEXT:    addi sp, sp, -256
>   +; RV32I-NEXT:    .cfi_def_cfa_offset 256
>   +; RV32I-NEXT:    sw ra, 252(sp) # 4-byte Folded Spill
>   +; RV32I-NEXT:    sw s0, 248(sp) # 4-byte Folded Spill
>    ; RV32I-NEXT:    .cfi_offset ra, -4
>    ; RV32I-NEXT:    .cfi_offset s0, -8
>   -; RV32I-NEXT:    addi s0, sp, 2032
>   +; RV32I-NEXT:    addi s0, sp, 256
>    ; RV32I-NEXT:    .cfi_def_cfa s0, 0
>   -; RV32I-NEXT:    lui a0, 2
>   -; RV32I-NEXT:    addi a0, a0, -2032
>   +; RV32I-NEXT:    li a0, 31
>   +; RV32I-NEXT:    slli a0, a0, 8
>    ; RV32I-NEXT:    sub sp, sp, a0
>    ; RV32I-NEXT:    srli a0, sp, 12
>    ; RV32I-NEXT:    slli sp, a0, 12
>    ; RV32I-NEXT:    lui a0, 1
>   -; RV32I-NEXT:    add a0, sp, a0
>   +; RV32I-NEXT:    add a0, a0, sp
>    ; RV32I-NEXT:    call callee at plt
>    ; RV32I-NEXT:    lui a0, 2
>    ; RV32I-NEXT:    sub sp, s0, a0
>   -; RV32I-NEXT:    addi a0, a0, -2032
>   +; RV32I-NEXT:    li a0, 31
>   +; RV32I-NEXT:    slli a0, a0, 8
>    ; RV32I-NEXT:    add sp, sp, a0
>   -; RV32I-NEXT:    lw ra, 2028(sp) # 4-byte Folded Reload
>   -; RV32I-NEXT:    lw s0, 2024(sp) # 4-byte Folded Reload
>   -; RV32I-NEXT:    addi sp, sp, 2032
>   +; RV32I-NEXT:    lw ra, 252(sp) # 4-byte Folded Reload
>   +; RV32I-NEXT:    lw s0, 248(sp) # 4-byte Folded Reload
>   +; RV32I-NEXT:    addi sp, sp, 256
>    ; RV32I-NEXT:    ret
>    ;
>    ; RV64I-LABEL: caller4096:
>    ; RV64I:       # %bb.0:
>   -; RV64I-NEXT:    addi sp, sp, -2032
>   -; RV64I-NEXT:    .cfi_def_cfa_offset 2032
>   -; RV64I-NEXT:    sd ra, 2024(sp) # 8-byte Folded Spill
>   -; RV64I-NEXT:    sd s0, 2016(sp) # 8-byte Folded Spill
>   +; RV64I-NEXT:    addi sp, sp, -512
>   +; RV64I-NEXT:    .cfi_def_cfa_offset 512
>   +; RV64I-NEXT:    sd ra, 504(sp) # 8-byte Folded Spill
>   +; RV64I-NEXT:    sd s0, 496(sp) # 8-byte Folded Spill
>    ; RV64I-NEXT:    .cfi_offset ra, -8
>    ; RV64I-NEXT:    .cfi_offset s0, -16
>   -; RV64I-NEXT:    addi s0, sp, 2032
>   +; RV64I-NEXT:    addi s0, sp, 512
>    ; RV64I-NEXT:    .cfi_def_cfa s0, 0
>   -; RV64I-NEXT:    lui a0, 2
>   -; RV64I-NEXT:    addiw a0, a0, -2032
>   +; RV64I-NEXT:    li a0, 15
>   +; RV64I-NEXT:    slli a0, a0, 9
>    ; RV64I-NEXT:    sub sp, sp, a0
>    ; RV64I-NEXT:    srli a0, sp, 12
>    ; RV64I-NEXT:    slli sp, a0, 12
>    ; RV64I-NEXT:    lui a0, 1
>   -; RV64I-NEXT:    add a0, sp, a0
>   +; RV64I-NEXT:    add a0, a0, sp
>    ; RV64I-NEXT:    call callee at plt
>    ; RV64I-NEXT:    lui a0, 2
>    ; RV64I-NEXT:    sub sp, s0, a0
>   -; RV64I-NEXT:    addiw a0, a0, -2032
>   +; RV64I-NEXT:    li a0, 15
>   +; RV64I-NEXT:    slli a0, a0, 9
>    ; RV64I-NEXT:    add sp, sp, a0
>   -; RV64I-NEXT:    ld ra, 2024(sp) # 8-byte Folded Reload
>   -; RV64I-NEXT:    ld s0, 2016(sp) # 8-byte Folded Reload
>   -; RV64I-NEXT:    addi sp, sp, 2032
>   +; RV64I-NEXT:    ld ra, 504(sp) # 8-byte Folded Reload
>   +; RV64I-NEXT:    ld s0, 496(sp) # 8-byte Folded Reload
>   +; RV64I-NEXT:    addi sp, sp, 512
>    ; RV64I-NEXT:    ret
>      %1 = alloca i8, align 4096
>      call void @callee(ptr %1)
>
> I think that is because we need to do a larger second stack adjustment if we do a small first stack adjustment.
> So the impact on performance should be evalated so that we can decide whether this optimization should be enabled under `-Os/-Oz` only. :-)

It seems that compiler doesn't generate the best code, and it can generate addi a0, a0, -256 like addi a0, a0, -2032, but it generates li a0, 31, slli a0, a0, 8 => 31 << 8 = 2 * 4096 -256,  @wangpc

I find that the reason why my optimization will add one extra instruction is that the difference in building large immediate 8192-2032 and 8192 - 512,
lui a0, 2, addiw a0, a0, -2032 is for 8192-2032
li a0, 15, slli a0, a0, 9 is for 8192-256,
and lui a0, 2 will be optimized later, so if build 8192 - 512 use lui a0, 2, addiw a0, a0, -256, the result will be similiar, so this regression may be avoided.

I show the codesize data here firstly and will provide the performace data later:

codesize:
F28684538: image.png <https://reviews.llvm.org/F28684538>

================
Comment at: llvm/lib/Target/RISCV/RISCVFrameLowering.cpp:1315
+      // riscv32: c.lwsp rd, offset[7:2] => 2^(6+2)
+      const uint64_t RV32CompressLen = 256;
+      // riscv64: c.lwsp rd, offset[8:3] => 2^(6+3)
----------------
craig.topper wrote:
> Can we compute CompressLen for both RV32 and RV64 as XLen * 8? And then merge the 2 if statements?
done, thanks for your nice advice

================
Comment at: llvm/lib/Target/RISCV/RISCVFrameLowering.cpp:1316
+      const uint64_t RV32CompressLen = 256;
+      // riscv64: c.lwsp rd, offset[8:3] => 2^(6+3)
+      const uint64_t RV64CompressLen = 512;
----------------
craig.topper wrote:
> c.ldsp?
done, add other related instructions case

================
Comment at: llvm/lib/Target/RISCV/RISCVFrameLowering.cpp:1326
+    }
+    return FirstSPAmount;
   }
----------------
craig.topper wrote:
> return 2048 - StackAlign
done

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157373/new/

https://reviews.llvm.org/D157373