[PATCH] D116779: [llvm-mca] [LSUnit] Proposal for declaring memory-barrier instructions explicitly rather than making conservative assumptions.

Patrick Holland via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 6 17:33:06 PST 2022


holland11 created this revision.
holland11 added reviewers: andreadb, qcolombet.
Herald added subscribers: kerbowa, pengfei, lebedev.ri, gbedwell, hiraditya, kristof.beyls, tpr, mgorny, nhaehnle, jvesely.
Herald added a reviewer: lebedev.ri.
holland11 requested review of this revision.
Herald added a project: LLVM.

**TLDR**: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which results in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.

First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.

As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the `LSUnit` uses the `MayLoad`, `MayStore`, and `HasUnmodeledSideEffects` flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.

What I've done is add two flags to the `InstructionBase` class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the `isAStoreBarrier()`, `isALoadBarrier()`, `setStoreBarrier()`, and `setLoadBarrier()` methods from the same class. In my mind, it is most natural to set these flags within a target's `InstrPostProcess::postProcessInstruction()` method. An example of this is included in the patch within the newly added `X86InstrPostProcess` class (within the X86CustomBehaviour.cpp file).

I also added a new command-line argument to llvm-mca (`--show-barriers`) which adds two additional columns to the `InstructionInfoView` to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.

I expected this patch to be quite disruptive to the mca test cases, but I only had to update 10 of the test cases (and then I added a new test case that shows the `--show-barriers` argument in action). I would really appreciate if some people could look through how those test cases were altered to make sure that they haven't been made inaccurate (or at least not more inaccurate). Feel free to add more reviewers if you'd like target specific developers to take a look at the updated tests.

Each of those 10 test cases have 1 or more instructions that the LSUnit used to conservatively assume were load and/or store barriers. I've listed those instructions below. For some of these test cases, this patch affected their resource pressure statistics. This is not something that I am very familiar with so I'd also really appreciate if someone could make sure that these changes to the resource pressure stats aren't moving in the wrong direction.

`AArch64/Cortex/A55-load-store-noalias.s`: 
`nop`

`AMDGPU/gfx9-retireooo.s`: 
`flat_load_dword` (This is the test case that motivated this patch so I'm fairly confident this one shouldn't be treated as a barrier.)

`X86/Barcelona/store-throughput.s `: 
`movd`

`X86/BdVer2/load-store-throughput.s`: 
`movd`

`X86/BdVer2/pr37790.s`: 
`int3` and `stmxcsr`

`X86/BdVer2/store-throughput.s`: 
`movd`

`X86/BtVer2/pr37790.s`: 
`int3` and `stmxcsr`

`X86/BtVer2/stmxcsr-ldmxcsr.s`: 
`stmxcsr` and `ldmxcsr`

`X86/Haswell/reserved-resources.s`: 
`fxrstor`

`X86/Haswell/stmxcsr-ldmxcsr.s`: 
`stmxcsr` and `ldmxcsr`

If any of these instructions //should// be load and/or store barriers, let me know and I can update this patch to set the appropriate flag(s) for those instructions.

Thank you for your time and thanks in advance for any of your input, suggestions, criticisms, or questions. And sorry again for having this patch take so long.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D116779

Files:
  llvm/docs/CommandGuide/llvm-mca.rst
  llvm/include/llvm/MCA/CustomBehaviour.h
  llvm/include/llvm/MCA/Instruction.h
  llvm/lib/MCA/HardwareUnits/LSUnit.cpp
  llvm/lib/Target/X86/CMakeLists.txt
  llvm/lib/Target/X86/MCA/CMakeLists.txt
  llvm/lib/Target/X86/MCA/X86CustomBehaviour.cpp
  llvm/lib/Target/X86/MCA/X86CustomBehaviour.h
  llvm/test/tools/llvm-mca/AArch64/Cortex/A55-load-store-noalias.s
  llvm/test/tools/llvm-mca/AMDGPU/gfx9-retireooo.s
  llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s
  llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s
  llvm/test/tools/llvm-mca/X86/BdVer2/pr37790.s
  llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s
  llvm/test/tools/llvm-mca/X86/BtVer2/pr37790.s
  llvm/test/tools/llvm-mca/X86/BtVer2/stmxcsr-ldmxcsr.s
  llvm/test/tools/llvm-mca/X86/Haswell/reserved-resources.s
  llvm/test/tools/llvm-mca/X86/Haswell/stmxcsr-ldmxcsr.s
  llvm/test/tools/llvm-mca/X86/barrier_output.s
  llvm/tools/llvm-mca/Views/InstructionInfoView.cpp
  llvm/tools/llvm-mca/Views/InstructionInfoView.h
  llvm/tools/llvm-mca/llvm-mca.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D116779.398017.patch
Type: text/x-patch
Size: 76350 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220107/ab215dc9/attachment.bin>


More information about the llvm-commits mailing list