[PATCH] D85364: [SVE][WIP] Implement lowering for fixed width select

Mon Sep 14 14:13:29 PDT 2020

cameron.mcinally added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7257

-    assert(SrcVT.getSizeInBits() == 2 * VT.getSizeInBits());
+    assert(SrcVT.getSizeInBits() <= 4 * VT.getSizeInBits());

----------------
efriedma wrote:
> cameron.mcinally wrote:
> > cameron.mcinally wrote:
> > > efriedma wrote:
> > > > cameron.mcinally wrote:
> > > > > @t.p.northover, does this change look okay to you? I suspect that this assert is assuming we only have NEON 64b and 128b vectors.
> > > > > 
> > > > > Fixed width SVE now has larger vectors, and for these particular tests we want to extract a 1/4 width subvector from a single width vector.
> > > > Why is 4 special here?
> > > It's a splitting quirk. I'm not even sure how to succinctly describe it, but here goes...
> > > 
> > > Let's assume a fixed width edge-case of -aarch64-sve-vector-bits-min=256. The weird splitting kicks in with vectors of >=64 elements.
> > > 
> > > In particular, anytime we hit a v64i1 and the operands have elements of 32b, the problem comes up. A v64i1 isn't a legal type, so it's split to 2*v32i1. v32i1 isn't legal either, but we can promote it to v32i8 (a 256b type). This is where the first BUILD_VECTOR comes in. Call it v32i8=BUILD_VECTOR.
> > > 
> > > So now we have something like:
> > > 
> > > <v32i32> = vselect <v32i8> cond, <v32i32> op1, <v32i32> op2
> > > 
> > > Legalization continues, and v32i32 isn't legal, so we split again to v16i32, and again to v8i32 (now legal at 256b). The mask is also split during this, so we end up with a v8i8=BUILD_VECTOR for the mask operand.
> > > 
> > > <v8i32> = vselect <v8i8> cond, <v8i32> op1, <v8i32> op2
> > > 
> > > Everything is legal now. The ReconstructShuffle(...) function now lowers the v8i8=BUILD_VECTOR, and finds the v32i8=BUILD_VECTOR as the source. So that v32/v8 gives us the magic 4. It's wonky, but I'm fairly sure it's bound at 4.
> > > 
> > > Now that I'm writing this out, I'm seeing that ReconstructShuffle(...) probably needs to be guarded better in general. We shouldn't hit the problem with fixed lowering (yet?), but I suppose someone could write a pathological set of BUILD_VECTORs to trip it up.
> > > 
> > > It's unfortunate that we can't distinguish masks from integer vectors. If we could, we'd avoid this. But that would mean making the vXi1 types legal, which I'm assuming causes headaches with NEON masks.
> > This now bails out when a non-NEON fixed width BUILD_VECTOR is seen. It's not ideal, but could be updated later if desired.
> I think you need to guard the creation of the AArch64ISD::EXT more explicitly; it won't do anything right now, but it might once we start messing with isShuffleMaskLegal.
Did you have something particular in mind?

We could check `Src.MaxElt - Src.MinElt <= NumSrcElts/2` (note the `<` is needed in case undefs are in play). That's true for all 3 if-conditions though, so would be better as an assert.

We could also check that `Src.MinElt < NumSrcElts && Src.MaxElt >= NumSrcElts`. That should be covered by the 2 preceding if-conditions, but it would be good to fortify for future code changes.

Am I missing anything else?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85364/new/

https://reviews.llvm.org/D85364