[PATCH] D85364: [SVE][WIP] Implement lowering for fixed width select

Mon Sep 14 07:45:18 PDT 2020

cameron.mcinally marked 3 inline comments as done.
cameron.mcinally added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7257

-    assert(SrcVT.getSizeInBits() == 2 * VT.getSizeInBits());
+    assert(SrcVT.getSizeInBits() <= 4 * VT.getSizeInBits());

----------------
cameron.mcinally wrote:
> efriedma wrote:
> > cameron.mcinally wrote:
> > > @t.p.northover, does this change look okay to you? I suspect that this assert is assuming we only have NEON 64b and 128b vectors.
> > > 
> > > Fixed width SVE now has larger vectors, and for these particular tests we want to extract a 1/4 width subvector from a single width vector.
> > Why is 4 special here?
> It's a splitting quirk. I'm not even sure how to succinctly describe it, but here goes...
> 
> Let's assume a fixed width edge-case of -aarch64-sve-vector-bits-min=256. The weird splitting kicks in with vectors of >=64 elements.
> 
> In particular, anytime we hit a v64i1 and the operands have elements of 32b, the problem comes up. A v64i1 isn't a legal type, so it's split to 2*v32i1. v32i1 isn't legal either, but we can promote it to v32i8 (a 256b type). This is where the first BUILD_VECTOR comes in. Call it v32i8=BUILD_VECTOR.
> 
> So now we have something like:
> 
> <v32i32> = vselect <v32i8> cond, <v32i32> op1, <v32i32> op2
> 
> Legalization continues, and v32i32 isn't legal, so we split again to v16i32, and again to v8i32 (now legal at 256b). The mask is also split during this, so we end up with a v8i8=BUILD_VECTOR for the mask operand.
> 
> <v8i32> = vselect <v8i8> cond, <v8i32> op1, <v8i32> op2
> 
> Everything is legal now. The ReconstructShuffle(...) function now lowers the v8i8=BUILD_VECTOR, and finds the v32i8=BUILD_VECTOR as the source. So that v32/v8 gives us the magic 4. It's wonky, but I'm fairly sure it's bound at 4.
> 
> Now that I'm writing this out, I'm seeing that ReconstructShuffle(...) probably needs to be guarded better in general. We shouldn't hit the problem with fixed lowering (yet?), but I suppose someone could write a pathological set of BUILD_VECTORs to trip it up.
> 
> It's unfortunate that we can't distinguish masks from integer vectors. If we could, we'd avoid this. But that would mean making the vXi1 types legal, which I'm assuming causes headaches with NEON masks.
This now bails out when a non-NEON fixed width BUILD_VECTOR is seen. It's not ideal, but could be updated later if desired.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85364/new/

https://reviews.llvm.org/D85364