[PATCH] D135488: [codegen] Add a remarks based Stack Layout Analysis pass

Thu Jan 12 11:57:53 PST 2023

nickdesaulniers added a comment.

It would be really nice if we could limit this to a specific function somehow.

Quick feedback from giving Diff 488727 a spin on the Linux kernel:

via `ARCH=arm make LLVM=1 -j128 -s allyesconfig all` I found:

    CC      drivers/net/ethernet/mellanox/mlx5/core/en_main.o
  drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than]
  static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
             ^

When I rebuild with:

  ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout

I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about `mlx5e_setup_tc`.  Is there a way to limit that?

Filtering through the logs though, I do see:

  drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
  Function: mlx5e_setup_tc
  ...
  Offset: [SP-400], Type: Variable, Align: 8, Size: 352
  Offset: [SP-752], Type: Variable, Align: 8, Size: 352
  Offset: [SP-1088], Type: Variable, Align: 8, Size: 336

which is good!  If I flip on debug info and rebuild, this looks like:

  drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
  Function: mlx5e_setup_tc
  ...
  Offset: [SP-400], Type: Variable, Align: 8, Size: 352
      old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934
      old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958
      new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539
  Offset: [SP-752], Type: Variable, Align: 8, Size: 352
      new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000
  Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
      new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424

Which is pretty awesome!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of `struct mlx5e_params` rather than stack allocating them!

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for `struct mlx5e_channels`. Cool stuff! Now we can finally debug `-Wframe-larger-than=`!!!

================
Comment at: clang/test/Frontend/stack-layout-remark.c:8
+// RUN: mkdir -p %t
+// RUN: %clang_cc1 %s -emit-codegen-only -triple x86_64-unknown-linux-gnu -target-cpu corei7 -Rpass-analysis=stack-frame-layout -o /dev/null  -O0  2>&1 | FileCheck %s --check-prefix=O0-NODEBUG
+// RUN: %clang_cc1 %s -emit-codegen-only -triple x86_64-unknown-linux-gnu -target-cpu corei7 -Rpass-analysis=stack-frame-layout -o /dev/null  -O0  -debug-info-kind=constructor  -dwarf-version=5 -debugger-tuning=gdb 2>&1 | FileCheck %s --check-prefix=O0-DEBUG
----------------
Please update:
1. the patch description/commit message
2. clang/docs/ReleaseNotes.rst

 to mention this new flag. I kind of wish that `-Wstack-frame-larger-than=` alluded to this somehow.

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:191
+    // initialize slot info
+    for (int Idx = ObjBeg, EndIdx = ObjEnd; Idx != EndIdx; ++Idx) {
+      // This is a dead slot, so skip it
----------------
do we need `EndIdx`, or can we simply use `Idx != ObjEnd` as the loop terminating condition?

Doesn't look like we use `ObjBeg` afterwards either; is that necessary?  Seems like the call to `MFI.getObjectIndexBegin();` could happen in the initialization list of this `for `loop?

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:200-202
+    sort(SlotInfo, [](const SlotData &A, const SlotData &B) {
+      return A.Offset > B.Offset;
+    });
----------------
Consider implementing `operator<` on `SlotData`; then I think you can simply call std::sort on SlotInfo and drop this lamda.

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:206-208
+      for (const DILocalVariable *N : SlotMap[Info.Slot]) {
+        emitSourceLocRemark(MF, N, Rem);
+      }
----------------
remove braces
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:216
+  SlotDbgMap genSlotDbgMapping(MachineFunction &MF) {
+    SlotDbgMap SlotDebugMap;
+    SmallVector<MachineInstr *> Dbg;
----------------
If you sink this def into the for loop, then you don't need to clear it.

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:220-222
+    for (MachineFunction::VariableDbgInfo &DI : MF.getVariableDbgInfo()) {
+      SlotDebugMap[DI.Slot].insert(DI.Var);
+    }
----------------
remove braces
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:227-228
+      for (MachineInstr &MI : MBB) {
+        if (MI.getNumMemOperands() == 0)
+          continue;
+        for (MachineMemOperand *MO : MI.memoperands()) {
----------------
I guess this can be removed since the `for` loop below wont do anything if there's 0 memoperands?

================
Comment at: llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp:65
+
+    SlotData(const MachineFrameInfo &MFI, const int ValOffset, const int Idx) {
+      Slot = Idx;
----------------
thegameg wrote:
> 
@paulkirth make sure to mark these code review comments as done when implemented, please!

================
Comment at: llvm/test/CodeGen/AArch64/O0-pipeline.ll:76-77
 ; CHECK-NEXT:       Insert CFI remember/restore state instructions
+; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
+; CHECK-NEXT:       Machine Optimization Remark Emitter
+; CHECK-NEXT:       Stack Frame Layout Analysis
----------------
Dang, this adds a bunch of passes to O0 pipelines...any creative ideas on how to not do that?

================
Comment at: llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll:43-46
+; HOTNESS:      Executing Pass 'Stack Frame Layout Analysis'
+; HOTNESS-NEXT: Freeing Pass 'Machine Optimization Remark Emitter'
+; HOTNESS-NEXT: Freeing Pass 'Lazy Machine Block Frequency Analysis'
+; HOTNESS-NEXT: Freeing Pass 'Stack Frame Layout Analysis'
----------------
what's going on in this test? Looks like the pass is being run twice or something?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135488/new/

https://reviews.llvm.org/D135488