[PATCH] D135488: [codegen] Add a remarks based Stack Layout Analysis pass
Nick Desaulniers via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Jan 12 11:57:53 PST 2023
nickdesaulniers added a comment.
It would be really nice if we could limit this to a specific function somehow.
Quick feedback from giving Diff 488727 a spin on the Linux kernel:
via `ARCH=arm make LLVM=1 -j128 -s allyesconfig all` I found:
CC drivers/net/ethernet/mellanox/mlx5/core/en_main.o
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than]
static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
^
When I rebuild with:
ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout
I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about `mlx5e_setup_tc`. Is there a way to limit that?
Filtering through the logs though, I do see:
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark:
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
which is good! If I flip on debug info and rebuild, this looks like:
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark:
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934
old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958
new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424
Which is pretty awesome!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of `struct mlx5e_params` rather than stack allocating them!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for `struct mlx5e_channels`. Cool stuff! Now we can finally debug `-Wframe-larger-than=`!!!
================
Comment at: clang/test/Frontend/stack-layout-remark.c:8
+// RUN: mkdir -p %t
+// RUN: %clang_cc1 %s -emit-codegen-only -triple x86_64-unknown-linux-gnu -target-cpu corei7 -Rpass-analysis=stack-frame-layout -o /dev/null -O0 2>&1 | FileCheck %s --check-prefix=O0-NODEBUG
+// RUN: %clang_cc1 %s -emit-codegen-only -triple x86_64-unknown-linux-gnu -target-cpu corei7 -Rpass-analysis=stack-frame-layout -o /dev/null -O0 -debug-info-kind=constructor -dwarf-version=5 -debugger-tuning=gdb 2>&1 | FileCheck %s --check-prefix=O0-DEBUG
----------------
Please update:
1. the patch description/commit message
2. clang/docs/ReleaseNotes.rst
to mention this new flag. I kind of wish that `-Wstack-frame-larger-than=` alluded to this somehow.
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:191
+ // initialize slot info
+ for (int Idx = ObjBeg, EndIdx = ObjEnd; Idx != EndIdx; ++Idx) {
+ // This is a dead slot, so skip it
----------------
do we need `EndIdx`, or can we simply use `Idx != ObjEnd` as the loop terminating condition?
Doesn't look like we use `ObjBeg` afterwards either; is that necessary? Seems like the call to `MFI.getObjectIndexBegin();` could happen in the initialization list of this `for `loop?
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:200-202
+ sort(SlotInfo, [](const SlotData &A, const SlotData &B) {
+ return A.Offset > B.Offset;
+ });
----------------
Consider implementing `operator<` on `SlotData`; then I think you can simply call std::sort on SlotInfo and drop this lamda.
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:206-208
+ for (const DILocalVariable *N : SlotMap[Info.Slot]) {
+ emitSourceLocRemark(MF, N, Rem);
+ }
----------------
remove braces
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:216
+ SlotDbgMap genSlotDbgMapping(MachineFunction &MF) {
+ SlotDbgMap SlotDebugMap;
+ SmallVector<MachineInstr *> Dbg;
----------------
If you sink this def into the for loop, then you don't need to clear it.
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:220-222
+ for (MachineFunction::VariableDbgInfo &DI : MF.getVariableDbgInfo()) {
+ SlotDebugMap[DI.Slot].insert(DI.Var);
+ }
----------------
remove braces
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:227-228
+ for (MachineInstr &MI : MBB) {
+ if (MI.getNumMemOperands() == 0)
+ continue;
+ for (MachineMemOperand *MO : MI.memoperands()) {
----------------
I guess this can be removed since the `for` loop below wont do anything if there's 0 memoperands?
================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:65
+
+ SlotData(const MachineFrameInfo &MFI, const int ValOffset, const int Idx) {
+ Slot = Idx;
----------------
thegameg wrote:
>
@paulkirth make sure to mark these code review comments as done when implemented, please!
================
Comment at: llvm/test/CodeGen/AArch64/O0-pipeline.ll:76-77
; CHECK-NEXT: Insert CFI remember/restore state instructions
+; CHECK-NEXT: Lazy Machine Block Frequency Analysis
+; CHECK-NEXT: Machine Optimization Remark Emitter
+; CHECK-NEXT: Stack Frame Layout Analysis
----------------
Dang, this adds a bunch of passes to O0 pipelines...any creative ideas on how to not do that?
================
Comment at: llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll:43-46
+; HOTNESS: Executing Pass 'Stack Frame Layout Analysis'
+; HOTNESS-NEXT: Freeing Pass 'Machine Optimization Remark Emitter'
+; HOTNESS-NEXT: Freeing Pass 'Lazy Machine Block Frequency Analysis'
+; HOTNESS-NEXT: Freeing Pass 'Stack Frame Layout Analysis'
----------------
what's going on in this test? Looks like the pass is being run twice or something?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135488/new/
https://reviews.llvm.org/D135488
More information about the cfe-commits
mailing list