[PATCH] D157373: [RISCV] add a compress optimization for stack inst.
lcvon via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Aug 13 19:41:03 PDT 2023
lcvon007 marked 5 inline comments as done.
lcvon007 added a comment.
In D157373#4579136 <https://reviews.llvm.org/D157373#4579136>, @wangpc wrote:
> I just found a regression:
>
> --- a/llvm/test/CodeGen/RISCV/stack-realignment.ll
> +++ b/llvm/test/CodeGen/RISCV/stack-realignment.ll
> @@ -1,7 +1,7 @@
> ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
> -; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
> +; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
> ; RUN: | FileCheck %s -check-prefix=RV32I
> -; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
> +; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
> ; RUN: | FileCheck %s -check-prefix=RV64I
>
> declare void @callee(ptr)
> @@ -529,56 +529,58 @@ define void @caller_no_realign2048() "no-realign-stack" {
> define void @caller4096() {
> ; RV32I-LABEL: caller4096:
> ; RV32I: # %bb.0:
> -; RV32I-NEXT: addi sp, sp, -2032
> -; RV32I-NEXT: .cfi_def_cfa_offset 2032
> -; RV32I-NEXT: sw ra, 2028(sp) # 4-byte Folded Spill
> -; RV32I-NEXT: sw s0, 2024(sp) # 4-byte Folded Spill
> +; RV32I-NEXT: addi sp, sp, -256
> +; RV32I-NEXT: .cfi_def_cfa_offset 256
> +; RV32I-NEXT: sw ra, 252(sp) # 4-byte Folded Spill
> +; RV32I-NEXT: sw s0, 248(sp) # 4-byte Folded Spill
> ; RV32I-NEXT: .cfi_offset ra, -4
> ; RV32I-NEXT: .cfi_offset s0, -8
> -; RV32I-NEXT: addi s0, sp, 2032
> +; RV32I-NEXT: addi s0, sp, 256
> ; RV32I-NEXT: .cfi_def_cfa s0, 0
> -; RV32I-NEXT: lui a0, 2
> -; RV32I-NEXT: addi a0, a0, -2032
> +; RV32I-NEXT: li a0, 31
> +; RV32I-NEXT: slli a0, a0, 8
> ; RV32I-NEXT: sub sp, sp, a0
> ; RV32I-NEXT: srli a0, sp, 12
> ; RV32I-NEXT: slli sp, a0, 12
> ; RV32I-NEXT: lui a0, 1
> -; RV32I-NEXT: add a0, sp, a0
> +; RV32I-NEXT: add a0, a0, sp
> ; RV32I-NEXT: call callee at plt
> ; RV32I-NEXT: lui a0, 2
> ; RV32I-NEXT: sub sp, s0, a0
> -; RV32I-NEXT: addi a0, a0, -2032
> +; RV32I-NEXT: li a0, 31
> +; RV32I-NEXT: slli a0, a0, 8
> ; RV32I-NEXT: add sp, sp, a0
> -; RV32I-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
> -; RV32I-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
> -; RV32I-NEXT: addi sp, sp, 2032
> +; RV32I-NEXT: lw ra, 252(sp) # 4-byte Folded Reload
> +; RV32I-NEXT: lw s0, 248(sp) # 4-byte Folded Reload
> +; RV32I-NEXT: addi sp, sp, 256
> ; RV32I-NEXT: ret
> ;
> ; RV64I-LABEL: caller4096:
> ; RV64I: # %bb.0:
> -; RV64I-NEXT: addi sp, sp, -2032
> -; RV64I-NEXT: .cfi_def_cfa_offset 2032
> -; RV64I-NEXT: sd ra, 2024(sp) # 8-byte Folded Spill
> -; RV64I-NEXT: sd s0, 2016(sp) # 8-byte Folded Spill
> +; RV64I-NEXT: addi sp, sp, -512
> +; RV64I-NEXT: .cfi_def_cfa_offset 512
> +; RV64I-NEXT: sd ra, 504(sp) # 8-byte Folded Spill
> +; RV64I-NEXT: sd s0, 496(sp) # 8-byte Folded Spill
> ; RV64I-NEXT: .cfi_offset ra, -8
> ; RV64I-NEXT: .cfi_offset s0, -16
> -; RV64I-NEXT: addi s0, sp, 2032
> +; RV64I-NEXT: addi s0, sp, 512
> ; RV64I-NEXT: .cfi_def_cfa s0, 0
> -; RV64I-NEXT: lui a0, 2
> -; RV64I-NEXT: addiw a0, a0, -2032
> +; RV64I-NEXT: li a0, 15
> +; RV64I-NEXT: slli a0, a0, 9
> ; RV64I-NEXT: sub sp, sp, a0
> ; RV64I-NEXT: srli a0, sp, 12
> ; RV64I-NEXT: slli sp, a0, 12
> ; RV64I-NEXT: lui a0, 1
> -; RV64I-NEXT: add a0, sp, a0
> +; RV64I-NEXT: add a0, a0, sp
> ; RV64I-NEXT: call callee at plt
> ; RV64I-NEXT: lui a0, 2
> ; RV64I-NEXT: sub sp, s0, a0
> -; RV64I-NEXT: addiw a0, a0, -2032
> +; RV64I-NEXT: li a0, 15
> +; RV64I-NEXT: slli a0, a0, 9
> ; RV64I-NEXT: add sp, sp, a0
> -; RV64I-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
> -; RV64I-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
> -; RV64I-NEXT: addi sp, sp, 2032
> +; RV64I-NEXT: ld ra, 504(sp) # 8-byte Folded Reload
> +; RV64I-NEXT: ld s0, 496(sp) # 8-byte Folded Reload
> +; RV64I-NEXT: addi sp, sp, 512
> ; RV64I-NEXT: ret
> %1 = alloca i8, align 4096
> call void @callee(ptr %1)
>
> I think that is because we need to do a larger second stack adjustment if we do a small first stack adjustment.
> So the impact on performance should be evalated so that we can decide whether this optimization should be enabled under `-Os/-Oz` only. :-)
It seems that compiler doesn't generate the best code, and it can generate addi a0, a0, -256 like addi a0, a0, -2032, but it generates li a0, 31, slli a0, a0, 8 => 31 << 8 = 2 * 4096 -256, @wangpc
I find that the reason why my optimization will add one extra instruction is that the difference in building large immediate 8192-2032 and 8192 - 512,
lui a0, 2, addiw a0, a0, -2032 is for 8192-2032
li a0, 15, slli a0, a0, 9 is for 8192-256,
and lui a0, 2 will be optimized later, so if build 8192 - 512 use lui a0, 2, addiw a0, a0, -256, the result will be similiar, so this regression may be avoided.
I show the codesize data here firstly and will provide the performace data later:
codesize:
F28684538: image.png <https://reviews.llvm.org/F28684538>
================
Comment at: llvm/lib/Target/RISCV/RISCVFrameLowering.cpp:1315
+ // riscv32: c.lwsp rd, offset[7:2] => 2^(6+2)
+ const uint64_t RV32CompressLen = 256;
+ // riscv64: c.lwsp rd, offset[8:3] => 2^(6+3)
----------------
craig.topper wrote:
> Can we compute CompressLen for both RV32 and RV64 as XLen * 8? And then merge the 2 if statements?
done, thanks for your nice advice
================
Comment at: llvm/lib/Target/RISCV/RISCVFrameLowering.cpp:1316
+ const uint64_t RV32CompressLen = 256;
+ // riscv64: c.lwsp rd, offset[8:3] => 2^(6+3)
+ const uint64_t RV64CompressLen = 512;
----------------
craig.topper wrote:
> c.ldsp?
done, add other related instructions case
================
Comment at: llvm/lib/Target/RISCV/RISCVFrameLowering.cpp:1326
+ }
+ return FirstSPAmount;
}
----------------
craig.topper wrote:
> return 2048 - StackAlign
done
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D157373/new/
https://reviews.llvm.org/D157373
More information about the llvm-commits
mailing list