[llvm] [AArch64][GlobalISel] Select UMULL instruction (PR #65469)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 11 01:19:04 PDT 2023
https://github.com/davemgreen commented:
> GlobalISel's custom legalization action doesn't work in the way that most would expect. When a custom legalization rule runs, the legalizer expects the result to be now fully legalized. Therefore you must completely handle every possible situation, since the legalizer will not re-add the instruction onto the worklist.
>
> In this case, the legalization of `<2 x s64>` G_MUL for a specific pattern can fail and as a result illegal `<2 x s64>` G_MULs will escape. E.g the following:
>
> ```
> ---
> name: long_mul
> legalized: true
> liveins:
> - { reg: '$x0' }
> - { reg: '$x1' }
> body: |
> bb.1:
> liveins: $x0, $x1
>
> %0:_(s64) = COPY $x0
> %1:_(s64) = COPY $x1
> %bv1:_(<2 x s64>) = G_BUILD_VECTOR %0, %0
> %bv2:_(<2 x s64>) = G_BUILD_VECTOR %1, %1
> %mul:_(<2 x s64>) = G_MUL %bv1, %bv2
> $q0 = COPY %mul(<2 x s64>)
> RET_ReallyLR implicit $q0
>
> ...
> ```
>
> will not scalarize the G_MUL. I think you can fix this by directly calling the `LegalizerHelper::fewerElementsVector()` helper with a scalar type.
That matches how this patch currently works, and I think the testcase works with it. There is a call to fewerElementsVector, it just gets complex with the way larger vectors are expanded and the order nodes are visited during legalization.
We are trying to convert `mul(zext, zext)` into `UMULL`. The legal types for this are v8i8->v8i16, v4i16->v4i32 and v2i32->v2i64. The last one is the awkward one as there is no v2i64 mul and so it needs to be scalarized. The other types are simpler and could honestly be done with tblgen patterns (as nodes like uaddl are). The scalarization of v2i64 mul means we end up needing to come up with something special for them, and we use the same method for all umull to be consistent.
On it's own I think those types would be easy enough to support. It would just match them during legalization and produce a UMULL in a custom method. The problem comes from v4i32->v4i64 type operations, and all the others larger type sizes. They need to be split up into parts of size v2i64. DAG would split types before operations, so it becomes simpler. GlobalISel is trying to do them both at the same time, so even if it splits the larger vectors it can be left with merge/unmerge pairs between the mul and the extends. (It seems to re-vist the mul repeatedly until legal, not move up to the operands that could be simplified).
So this patch adds a check to see if the operands _will_ be simplified away and are extends. What I understand happens is that the operands are then visited and split/simplified, and the legalizer reiterates and legalizes the mul on the second iteration to a UMULL. It did have to lie about the mul being legal though.
I think there are several ways to try and get this to work, they each have some difficulties.
- Do it post-legalization by letting the v2i64 mul scalarize and getting it to recombine later. I don't think this would work very well as it's often too difficult or impossible to reliably recombine the scalarized muls.
- Do it during legalization like this patch does. It just needs to deal with the case where the larger types are being split as operation is being legalized, and still generate something that is optimal. From what I can tell this seems to work (minus my comment about multi-use extends).
- Do it pre-legalization (or during legalization) but generate UMULL for all type sizes, then have separate legalization for UMULL to split it to legal vectors.
- Pretend all v2i64 mul are legal and do the legalization of them in AArch64PostLegalizerLowering? The rest of GlobalISel then might believe that v2s64 mul are legal.
- I think there may be another way in the long run, where we check the known bits of the operands and generate UMULL more aggressively, even generating UMULL(trunc, trunc) if needed. That might make the custom method more "complete", not relying on reiteration, providing the extra truncs would get removed.
I'm not sure which method would be best. Probably either generating the UMULL with larger types and having separate legalization or this method so long as we are not too unhappy about the lying and reiteration from the custom method. AArch64PostLegalizerLowering probably works, providing the normal v2s64 mul get cleaned up well enough, we are just re-inventing progressive type lowering and lying about v2s64 muls being legal more generally.
https://github.com/llvm/llvm-project/pull/65469
More information about the llvm-commits
mailing list