[PATCH] D116779: [llvm-mca] [LSUnit] Proposal for declaring memory-barrier instructions explicitly rather than making conservative assumptions.

Fri Jan 7 03:40:53 PST 2022

andreadb added a comment.

  TLDR: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which can result in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.

Thanks for working on this!

  First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.

Don't need to apologise! I wish there were more contributors like you :)
This patch is definitely still relevant. Thanks for contributing it!

  As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the LSUnit uses the MayLoad, MayStore, and HasUnmodeledSideEffects flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.

  What I've done is add two flags to the InstructionBase class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the isAStoreBarrier(), isALoadBarrier(), setStoreBarrier(), and setLoadBarrier() methods from the same class. In my mind, it is most natural to set these flags within a target's InstrPostProcess::postProcessInstruction() method. An example of this is included in the patch within the newly added X86InstrPostProcess class (within the X86CustomBehaviour.cpp file).

Sounds good to me!

  I also added a new command-line argument to llvm-mca (--show-barriers) which adds two additional columns to the InstructionInfoView to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.

Thanks for adding that flag!

I had a look at the patch and it looks good modulo a couple of nits.

The changes to the x86 tests make sense. I only had a couple of nits (see my other comments below).
I trust you that the CMake changes work fine. :)

================
Comment at: llvm/test/tools/llvm-mca/X86/barrier_output.s:1-16
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -iterations=1 -resource-pressure=false -summary-view=false -show-barriers < %s | FileCheck %s
+
+addpd       %xmm0, %xmm2
+addpd       (%rax), %xmm2
+
+addsd       %xmm0, %xmm2
----------------
You also need to test SFENCE here.
You can also get rid of most instructions here. Personally, I would only leave SFENCE MFENCE LFENCE and any other instructions with "unmodeled side effects". So, CLFLUSH, LFENCE, MFENCE and SFENCE are fine. All other SSE instructions can be removed in my opinion

================
Comment at: llvm/tools/llvm-mca/Views/InstructionInfoView.h:58
+  bool PrintBarriers;
+  const std::vector<std::unique_ptr<mca::Instruction>> &LoweredInsts;

----------------
Can this be something like this?

```
   using UniqueInst = std::unique_ptr<mca::Instruction>;
   ArrayRef<UniqueInst> LoweredInsts;
```

================
Comment at: llvm/tools/llvm-mca/llvm-mca.cpp:555-557
         Printer.addView(std::make_unique<mca::InstructionInfoView>(
-            *STI, *MCII, CE, ShowEncoding, Insts, *IP));
+            *STI, *MCII, CE, ShowEncoding, Insts, *IP, LoweredSequence,
+            ShowBarriers));
----------------
I wonder if we should use SmallVector for the LoweredSequence (instead of a std::vector).
The average code snippet size tends to be very small. So, a SmallVector might perform a bit better.

The constructor of InstructionInfoView should probably use an ArrayRef for the LoweredSequence. This would be similar to what we already do for the SourceMgr.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116779/new/

https://reviews.llvm.org/D116779