[llvm] [AMDGPU] MCExpr printing helper with KnownBits support (PR #95951)

Mon Jun 24 14:40:34 PDT 2024

================
@@ -314,3 +315,142 @@ AMDGPUVariadicMCExpr::createOccupancy(unsigned InitOcc, const MCExpr *NumSGPRs,
                  CreateExpr(InitOcc), NumSGPRs, NumVGPRs},
                 Ctx);
 }
+
+static KnownBits AMDGPUMCExprKnownBits(const MCExpr *Expr, raw_ostream &OS,
----------------
slinder1 wrote:

First, thank you so much for working on this @JanekvO ! It is great to see the expressions disappear in so many cases :)

I don't agree that it is too fancy, but a lot of it is generic. Would it be a step too far to actually incorporate the `KnownBits` handling into the `MCExpr`/`MCTargetExpr` APIs?

It can be optional, if other targets don't want to pay the cost or want to see their expressions verbatim, but some simplification seems nice even in the case of user-generated expressions.

I think the remaining cases can also be improved by just applying the same algorithm you have here at every level of subexpression, rather than once for the whole expression. For example, if `AMDGPUMCExprKnownBits` also constructed a new `MCExpr *` where any subexpression with fully-known-bits is replaced with a constant it would improve readability further IMO.

I hacked in some debug statements to see the evolution of the KnownBits of subexpressions for one example from `hsa-sym-exprs-gfx90a.s`, and added notes (prefixed with `//`) where the evaluation could detect that a fully-known-bits subexpression was used in an expression which is not fully-known.

If we also handled some identity cases for operations the result gets pretty manageable.

```
                .amdhsa_exception_int_div_zero


Original full expr: (((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824)>>30

AMDGPUMCExprKnownBits(depth=8) for: 0
        KnownBits=0000000000000000000000000000000000000000000000000000000000000000

AMDGPUMCExprKnownBits(depth=7) for: 128
        KnownBits=0000000000000000000000000000000000000000000000000000000010000000

AMDGPUMCExprKnownBits(depth=8) for: ~128
        KnownBits=1111111111111111111111111111111111111111111111111111111101111111

AMDGPUMCExprKnownBits(depth=9) for: 0&(~128)
        KnownBits=0000000000000000000000000000000000000000000000000000000000000000

AMDGPUMCExprKnownBits(depth=8) for: 1
        KnownBits=0000000000000000000000000000000000000000000000000000000000000001

AMDGPUMCExprKnownBits(depth=8) for: 7
        KnownBits=0000000000000000000000000000000000000000000000000000000000000111

AMDGPUMCExprKnownBits(depth=9) for: 1<<7
        KnownBits=0000000000000000000000000000000000000000000000000000000010000000

AMDGPUMCExprKnownBits(depth=10) for: (0&(~128))|(1<<7)
        KnownBits=0000000000000000000000000000000000000000000000000000000010000000

AMDGPUMCExprKnownBits(depth=9) for: 1
        KnownBits=0000000000000000000000000000000000000000000000000000000000000001

AMDGPUMCExprKnownBits(depth=10) for: ~1
        KnownBits=1111111111111111111111111111111111111111111111111111111111111110

AMDGPUMCExprKnownBits(depth=11) for: ((0&(~128))|(1<<7))&(~1)
        KnownBits=0000000000000000000000000000000000000000000000000000000010000000

AMDGPUMCExprKnownBits(depth=10) for: defined_boolean
        KnownBits=????????????????????????????????????????????????????????????????

AMDGPUMCExprKnownBits(depth=10) for: 0
        KnownBits=0000000000000000000000000000000000000000000000000000000000000000

// To simplify further we can also check for identity operations like this one and eliminate them
AMDGPUMCExprKnownBits(depth=11) for: defined_boolean<<0
        KnownBits=????????????????????????????????????????????????????????????????

// Here is the first instance of an operation across a fully known subexpr and a partially known subexpr.
// We know the fully known subexpr is maximal here so we can lazily produce a constant MCExpr for it.
AMDGPUMCExprKnownBits(depth=12) for: (((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0)
        KnownBits=????????????????????????????????????????????????????????1???????

AMDGPUMCExprKnownBits(depth=11) for: 62
        KnownBits=0000000000000000000000000000000000000000000000000000000000111110

AMDGPUMCExprKnownBits(depth=12) for: ~62
        KnownBits=1111111111111111111111111111111111111111111111111111111111000001

// Here again we can make a single constant out of the known subexpr, although it might even impair readability
// in this case?
AMDGPUMCExprKnownBits(depth=13) for: ((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62)
        KnownBits=????????????????????????????????????????????????????????1?00000?

AMDGPUMCExprKnownBits(depth=12) for: 0
        KnownBits=0000000000000000000000000000000000000000000000000000000000000000
AMDGPUMCExprKnownBits(depth=12) for: 1
        KnownBits=0000000000000000000000000000000000000000000000000000000000000001

AMDGPUMCExprKnownBits(depth=13) for: 0<<1
        KnownBits=0000000000000000000000000000000000000000000000000000000000000000

// Another identity case
AMDGPUMCExprKnownBits(depth=14) for: (((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1)
        KnownBits=????????????????????????????????????????????????????????1?00000?

AMDGPUMCExprKnownBits(depth=14) for: 1073741824
        KnownBits=0000000000000000000000000000000001000000000000000000000000000000

AMDGPUMCExprKnownBits(depth=15) for: ((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824
        KnownBits=000000000000000000000000000000000?000000000000000000000000000000

AMDGPUMCExprKnownBits(depth=15) for: 30
        KnownBits=0000000000000000000000000000000000000000000000000000000000011110

AMDGPUMCExprKnownBits(depth=16) for: (((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824)>>30
        KnownBits=000000000000000000000000000000000000000000000000000000000000000?

```

For this example we could turn this:

```
(((((((0&(~128))|(1<<7))&(~1))|(defined_boolean<<0))&(~62))|(0<<1))&1073741824)>>30
```

Into this:

```
(((128|defined_boolean)&0xffffffffffffffc1)&1073741824)>>30
```

I guess the question is where to draw the line, if we go down this path. One could always add heuristics for other cases based on operator associativity, commutativity, etc. but I would think we could just implement maximally-known-bits-subexpression and some simple identities (`(+, 0)`, `(*, 1)`, `(>>, 0)`, ...) and get the majority of the benefit without really doing any more heavy lifting than is already present in this patch.

https://github.com/llvm/llvm-project/pull/95951