[llvm] [X86] Skip AMX type lowering when AMX is not used (PR #92910)
via llvm-commits
llvm-commits at lists.llvm.org
Wed May 22 04:31:12 PDT 2024
================
@@ -1230,6 +1238,14 @@ class X86LowerAMXTypeLegacyPass : public FunctionPass {
}
bool runOnFunction(Function &F) override {
+ // Performance optimization: most code doesn't use AMX, so return early if
+ // there are no instructions that produce AMX values. This is sufficient, as
+ // AMX arguments and constants are not allowed -- so any producer of an AMX
+ // value must be an instruction.
+ // TODO: find a cheaper way for this, without looking at all instructions.
+ if (!containsAMXCode(F))
----------------
aengelke wrote:
>From what I see on AMX-related passes:
- IR LowerAMXTypes, always iterates over IR a few times (thus this PR)
- IR LowerAMXIntrinsics, off by default (technically it's in the pipeline, but it returns early if `-enable-x86-scalar-amx` is not set) => not relevant.
- MIR FastPreTileConfig (O0), iterates over all virtual regs, exits early if no tile reg exists (NB: iterating over registers is much faster than iterating over instructions)
- MIR PreTileConfig (non-O0), iterates over all machine instrs, returns early if no tile reg def or use exists (not benchmarkd)
- MIR FastTileConfig (O0), iterates over all machine instrs, returns early if no tile reg def or use exists (quite expensive for doing nothing)
- MIR TileConfig (non-O0), iterates over all machine instrs, returns early if VirtRegMap shape map is empty (not benchmarked, but probably ok)
- MIR LowerTileCopy, iterates over all MBB live ins, returns early if no live in is a tile reg
So there is one IR pass and three(five) MIR passes that would benefit from such information. In MIR, X86MachineFunctionInfo seems like a good place and we could trivially set a "uses AMX" flag during ISel when selecting intrinsics. (X86MFI already has AMX-related HasVirtualTileReg, so adding something like UsesAMX would seem good.) MFI is (AIUI) not available before ISel and therefore not in the IR pass. I can (and probably will) implement such a change in a separate PR.
Adding a new analysis pass just for the IR pass feels like overkill for 1 bit of information per function and from the existing immutable passes and passes preserved by MachineFunctionPass, nothing seems like a good fit.
https://github.com/llvm/llvm-project/pull/92910
More information about the llvm-commits
mailing list