[llvm] [AMDGPU] MCExpr printing helper with KnownBits support (PR #95951)
Scott Linder via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 24 14:40:34 PDT 2024
================
@@ -314,3 +315,142 @@ AMDGPUVariadicMCExpr::createOccupancy(unsigned InitOcc, const MCExpr *NumSGPRs,
CreateExpr(InitOcc), NumSGPRs, NumVGPRs},
Ctx);
}
+
+static KnownBits AMDGPUMCExprKnownBits(const MCExpr *Expr, raw_ostream &OS,
----------------
slinder1 wrote:
First, thank you so much for working on this @JanekvO ! It is great to see the expressions disappear in so many cases :)
I don't agree that it is too fancy, but a lot of it is generic. Would it be a step too far to actually incorporate the `KnownBits` handling into the `MCExpr`/`MCTargetExpr` APIs?
It can be optional, if other targets don't want to pay the cost or want to see their expressions verbatim, but some simplification seems nice even in the case of user-generated expressions.
I think the remaining cases can also be improved by just applying the same algorithm you have here at every level of subexpression, rather than once for the whole expression. For example, if `AMDGPUMCExprKnownBits` also constructed a new `MCExpr *` where any subexpression with fully-known-bits is replaced with a constant it would improve readability further IMO.
I hacked in some debug statements to see the evolution of the KnownBits of subexpressions for one example from `hsa-sym-exprs-gfx90a.s`, and added notes (prefixed with `//`) where the evaluation could detect that a fully-known-bits subexpression was used in an expression which is not fully-known.
If we also handled some identity cases for operations the result gets pretty manageable.
```
.amdhsa_exception_int_div_zero
Original full expr: (((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824)>>30
AMDGPUMCExprKnownBits(depth=8) for: 0
KnownBits=0000000000000000000000000000000000000000000000000000000000000000
AMDGPUMCExprKnownBits(depth=7) for: 128
KnownBits=0000000000000000000000000000000000000000000000000000000010000000
AMDGPUMCExprKnownBits(depth=8) for: ~128
KnownBits=1111111111111111111111111111111111111111111111111111111101111111
AMDGPUMCExprKnownBits(depth=9) for: 0&(~128)
KnownBits=0000000000000000000000000000000000000000000000000000000000000000
AMDGPUMCExprKnownBits(depth=8) for: 1
KnownBits=0000000000000000000000000000000000000000000000000000000000000001
AMDGPUMCExprKnownBits(depth=8) for: 7
KnownBits=0000000000000000000000000000000000000000000000000000000000000111
AMDGPUMCExprKnownBits(depth=9) for: 1<<7
KnownBits=0000000000000000000000000000000000000000000000000000000010000000
AMDGPUMCExprKnownBits(depth=10) for: (0&(~128))|(1<<7)
KnownBits=0000000000000000000000000000000000000000000000000000000010000000
AMDGPUMCExprKnownBits(depth=9) for: 1
KnownBits=0000000000000000000000000000000000000000000000000000000000000001
AMDGPUMCExprKnownBits(depth=10) for: ~1
KnownBits=1111111111111111111111111111111111111111111111111111111111111110
AMDGPUMCExprKnownBits(depth=11) for: ((0&(~128))|(1<<7))&(~1)
KnownBits=0000000000000000000000000000000000000000000000000000000010000000
AMDGPUMCExprKnownBits(depth=10) for: defined_boolean
KnownBits=????????????????????????????????????????????????????????????????
AMDGPUMCExprKnownBits(depth=10) for: 0
KnownBits=0000000000000000000000000000000000000000000000000000000000000000
// To simplify further we can also check for identity operations like this one and eliminate them
AMDGPUMCExprKnownBits(depth=11) for: defined_boolean<<0
KnownBits=????????????????????????????????????????????????????????????????
// Here is the first instance of an operation across a fully known subexpr and a partially known subexpr.
// We know the fully known subexpr is maximal here so we can lazily produce a constant MCExpr for it.
AMDGPUMCExprKnownBits(depth=12) for: (((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0)
KnownBits=????????????????????????????????????????????????????????1???????
AMDGPUMCExprKnownBits(depth=11) for: 62
KnownBits=0000000000000000000000000000000000000000000000000000000000111110
AMDGPUMCExprKnownBits(depth=12) for: ~62
KnownBits=1111111111111111111111111111111111111111111111111111111111000001
// Here again we can make a single constant out of the known subexpr, although it might even impair readability
// in this case?
AMDGPUMCExprKnownBits(depth=13) for: ((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62)
KnownBits=????????????????????????????????????????????????????????1?00000?
AMDGPUMCExprKnownBits(depth=12) for: 0
KnownBits=0000000000000000000000000000000000000000000000000000000000000000
AMDGPUMCExprKnownBits(depth=12) for: 1
KnownBits=0000000000000000000000000000000000000000000000000000000000000001
AMDGPUMCExprKnownBits(depth=13) for: 0<<1
KnownBits=0000000000000000000000000000000000000000000000000000000000000000
// Another identity case
AMDGPUMCExprKnownBits(depth=14) for: (((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1)
KnownBits=????????????????????????????????????????????????????????1?00000?
AMDGPUMCExprKnownBits(depth=14) for: 1073741824
KnownBits=0000000000000000000000000000000001000000000000000000000000000000
AMDGPUMCExprKnownBits(depth=15) for: ((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824
KnownBits=000000000000000000000000000000000?000000000000000000000000000000
AMDGPUMCExprKnownBits(depth=15) for: 30
KnownBits=0000000000000000000000000000000000000000000000000000000000011110
AMDGPUMCExprKnownBits(depth=16) for: (((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824)>>30
KnownBits=000000000000000000000000000000000000000000000000000000000000000?
```
For this example we could turn this:
```
(((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824)>>30
```
Into this:
```
(((128|defined_boolean)&0xffffffffffffffc1)&1073741824)>>30
```
I guess the question is where to draw the line, if we go down this path. One could always add heuristics for other cases based on operator associativity, commutativity, etc. but I would think we could just implement maximally-known-bits-subexpression and some simple identities (`(+, 0)`, `(*, 1)`, `(>>, 0)`, ...) and get the majority of the benefit without really doing any more heavy lifting than is already present in this patch.
https://github.com/llvm/llvm-project/pull/95951
More information about the llvm-commits
mailing list