[PATCH] D145301: Add more efficient vector bitcast for AArch64

Mon Apr 10 11:45:34 PDT 2023

lawben added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19520-19523
+    ComparisonResult = DAG.getSetCC(
+        DL, VecVT, ComparisonResult,
+        DAG.getSplatBuildVector(VecVT, DL, DAG.getConstant(0, DL, MVT::i64)),
+        ISD::CondCode::SETNE);
----------------
dmgreen wrote:
> lawben wrote:
> > dmgreen wrote:
> > > lawben wrote:
> > > > Sp00ph wrote:
> > > > > Does this always produce the same result as the current implementation? If in general a trunc store would get lowered to something like this:
> > > > > ```
> > > > > %trunc = trunc <N x i8> %vec to <N x i1>
> > > > > store <N x i1> %trunc, ptr %dst
> > > > > ```
> > > > > then I think the result would differ if e.g. the input was `<1 x i8> <i8 2>` (current implementation would store a 0, this PR would store a 1). This should be easily fixable though by doing `vec & splat(1) != splat(0)` instead of just `vec != splat(0)`.
> > > > Actually, I don't think we need it here. For us to get here, we either go through:
> > > > 
> > > > a) `replaceBoolVectorBitcast`, which guarantees that the target type is `<n x i1>`
> > > > b) `combineVectorCompareAndTruncateStore`, which explicitly does what you mentioned. 
> > > > 
> > > > In a), we know that the type is a vector of `i1`s, so there must have been some operation to generate/truncate that. If it is `truncate`d, the code for `AND 1` is added in the truncate.
> > > > 
> > > > I can still add this defensively, but I think we don't need it. What do you think? 
> > > What happens if the isChainOfComparesAndLogicalOps path is removed? I'm not sure all the tests would still look OK.
> > I'm not saying that the `isChainOf` check should be removed. I was arguing for for not needing the `AND 1` truncation. I've decided to add it defensively anyway to avoid producing wrong results. It has no impact on the current 19 new test, as it is either created here or somewhere else during lowering, so there is no additional cost here in all cases I've looked at.
> > 
> > This also allows me to remove the somewhat duplicated logic in `combineVectorCompareAndTruncateStore` (see follow-up patch), so this is probably the cleaner way anyways.
> Sorry - my point was that if we take this patch and remove the `if (isChainOfComparesAndLogicalOps(..))` part, some of the tests do not look correct any more, which suggests this path isn't always valid.
> 
> As far as I understand VecVT is always a vXi1 type, which makes the And with 1 a bit odd and we are assuming that a `setcc vXi1 N, vXi1 0` will always become a, say, v16i8 vector with all the lanes either 0 or 1. I think instead it would be better to produce a `sext vXi1 N to vXi?`, where the new type is a legal size. This is essentially what getBoolExtOrTrunc is doing already, but it might be better to be explicit as this transform really requires the lanes to be all-one of all-zero.
> 
> The best type (vXi?) needs to be guessed at, which might best come from the setcc if there is one (the same as the existing isChainOfComparesAndLogicalOps, but it only needs to be a guess now to get the best type).
> 
> This way we make sure that even in isolation, this transform is "correct" and is not making assumptions about what will happen in other places with vXi1 types.
I've implemented it like you suggested with a sign-extend. I'm "guessing" the vector type to be 64-bit. This may add an `xtn` instruction for 128-bit vectors, but in a few cases, the vector is reduced to 64-bit anyways, so we don't loose anything there. The alternative is to assume 128-bit vectors, but that resulted in a few expansions that we don't need.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145301/new/

https://reviews.llvm.org/D145301