[llvm] [CodeGen] Use 128bits for LaneBitmask. (PR #111157)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 4 07:51:45 PDT 2024
sdesmalen-arm wrote:
> I still don't understand why AArch64 needs so many bits. Having sub registers that alias does not mean you need additional register units. You should only need one for each physically distinct bits, despite differences in access
I think this is mostly because defining register tuples (2x, 3x and 4x) replicates the regunits. When I define the top bits and do some post-processing of the table in AArch64GenRegisterInfo.inc, I get the following lane masks:
```
0x00000000000000000001 // bsub
0x00000000000000000002 // bsub_hi
0x00000000000000000004 // dsub_hi
0x00000000000000000008 // hsub_hi
0x00000000000000000010 // psub
0x00000000000000000020 // qsub_hi
0x00000000000000000040 // ssub_hi
0x00000000000000000080 // sub_32, sube64, x8sub_0
0x00000000000000000100 // sube32
0x00000000000000000200 // subo32
0x00000000000000000400 // zasubq0
0x00000000000000000800 // zasubq1
0x00000000000000001000 // zasubd1_then_zasubq0
0x00000000000000002000 // zasubd1_then_zasubq1
0x00000000000000004000 // zasubs1_then_zasubq0
0x00000000000000008000 // zasubs1_then_zasubq1
0x00000000000000010000 // zasubs1_then_zasubd1_then_zasubq0
0x00000000000000020000 // zasubs1_then_zasubd1_then_zasubq1
0x00000000000000040000 // zasubh1_then_zasubq0
0x00000000000000080000 // zasubh1_then_zasubq1
0x00000000000000100000 // zasubh1_then_zasubd1_then_zasubq0
0x00000000000000200000 // zasubh1_then_zasubd1_then_zasubq1
0x00000000000000400000 // zasubh1_then_zasubs1_then_zasubq0
0x00000000000000800000 // zasubh1_then_zasubs1_then_zasubq1
0x00000000000001000000 // zasubh1_then_zasubs1_then_zasubd1_then_zasubq0
0x00000000000002000000 // zasubh1_then_zasubs1_then_zasubd1_then_zasubq1
0x00000000000004000000 // dsub1_then_bsub
0x00000000000008000000 // dsub1_then_bsub_hi
0x00000000000010000000 // dsub1_then_hsub_hi
0x00000000000020000000 // dsub1_then_ssub_hi
0x00000000000040000000 // dsub3_then_bsub
0x00000000000080000000 // dsub3_then_bsub_hi
0x00000000000100000000 // dsub3_then_hsub_hi
0x00000000000200000000 // dsub3_then_ssub_hi
0x00000000000400000000 // dsub2_then_bsub
0x00000000000800000000 // dsub2_then_bsub_hi
0x00000000001000000000 // dsub2_then_hsub_hi
0x00000000002000000000 // dsub2_then_ssub_hi
0x00000000004000000000 // psub1, psub1_then_psub
0x00000000008000000000 // qsub1_then_bsub
0x00000000010000000000 // qsub1_then_bsub_hi
0x00000000020000000000 // qsub1_then_dsub_hi
0x00000000040000000000 // qsub1_then_hsub_hi
0x00000000080000000000 // qsub1_then_ssub_hi
0x00000000100000000000 // qsub3_then_bsub
0x00000000200000000000 // qsub3_then_bsub_hi
0x00000000400000000000 // qsub3_then_dsub_hi
0x00000000800000000000 // qsub3_then_hsub_hi
0x00000001000000000000 // qsub3_then_ssub_hi
0x00000002000000000000 // qsub2_then_bsub
0x00000004000000000000 // qsub2_then_bsub_hi
0x00000008000000000000 // qsub2_then_dsub_hi
0x00000010000000000000 // qsub2_then_hsub_hi
0x00000020000000000000 // qsub2_then_ssub_hi
0x00000040000000000000 // x8sub_7, x8sub_7_then_sub_32
0x00000080000000000000 // x8sub_6, x8sub_6_then_sub_32
0x00000100000000000000 // x8sub_5, x8sub_5_then_sub_32
0x00000200000000000000 // x8sub_4, x8sub_4_then_sub_32
0x00000400000000000000 // x8sub_3, x8sub_3_then_sub_32
0x00000800000000000000 // x8sub_2, x8sub_2_then_sub_32
0x00001000000000000000 // x8sub_1, x8sub_1_then_sub_32
0x00002000000000000000 // subo64, subo64_then_sub_32
0x00004000000000000000 // zsub1_then_bsub
0x00008000000000000000 // zsub1_then_bsub_hi
0x00010000000000000000 // zsub1_then_dsub_hi
0x00020000000000000000 // zsub1_then_hsub_hi
0x00040000000000000000 // zsub1_then_qsub_hi
0x00080000000000000000 // zsub1_then_ssub_hi
0x00100000000000000000 // zsub3_then_bsub
0x00200000000000000000 // zsub3_then_bsub_hi
0x00400000000000000000 // zsub3_then_dsub_hi
0x00800000000000000000 // zsub3_then_hsub_hi
0x01000000000000000000 // zsub3_then_qsub_hi
0x02000000000000000000 // zsub3_then_ssub_hi
0x04000000000000000000 // zsub2_then_bsub
0x08000000000000000000 // zsub2_then_bsub_hi
0x10000000000000000000 // zsub2_then_dsub_hi
0x20000000000000000000 // zsub2_then_hsub_hi
0x40000000000000000000 // zsub2_then_qsub_hi
0x80000000000000000000 // zsub2_then_ssub_hi
```
Where:
* zasub => Matrix registers
* bsub/hsub/ssub/dsub/qsub => FP/vector registers
* zsub => SVE (scalable data vectors)
* psub => SVE (scalable predicate vectors)
* anything else => GPR registers
https://github.com/llvm/llvm-project/pull/111157
More information about the llvm-commits
mailing list