[llvm] Question: What is the correct interpretation of LaneBitmask? (PR #109797)

Tue Sep 24 09:06:58 PDT 2024

sdesmalen-arm wrote:

> > I have some local patches that define registers/subreg-indices for the top bits, but then ran into an issue that the uint64_t to represent the LaneBitmask is no longer sufficient.
> 
> Is this for the regular AArch64 integer registers? My understanding is:
> 
> > AArch64 registers work like this: b10 aliases the low 8 bits of h10 which aliases the low 16 bits of s10 which aliases the low 32 bits of d10 which aliases the low 64 bits of q10 which aliases the low 128 bits of z10.
> 
> So I would expect you would need subregisters for:
> 
> * bits 7..0
> * bits 15..8
> * bits 31..16
> * bits 63..32
> * bits 127..64
> * bits 255..128
> 
> I.e. roughly log2(bitwidth) of them.

That's right, however it is the register tuples that make things take up space, as it replicates the bits for each subvector in the tuple. It ends up with something like this:
```
qqsub_0: bsub, bsub_hi, hsub, hsub_hi, ... qsub, qsub_hi
qqsub_1: bsub, bsub_hi, hsub, hsub_hi, ... qsub, qsub_hi
```
and similar for other tuples, where each sub register must be separately addressable and cannot reuse the bits (otherwise qqsub_0:bsub and qqsub_1:bsub would alias for example). Then there are also bits to represent the indices in GPR registers a set of bits needed to express the (sub)tiles in a matrix register. With all of these together in the same bitmask, we run out of the 64 bits.

https://github.com/llvm/llvm-project/pull/109797