[PATCH] D145301: Add more efficient vector bitcast for AArch64

Thu Apr 6 02:28:24 PDT 2023

lawben added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19507
+  EVT BaseVT;
+  if (isChainOfComparesAndLogicalOps(ComparisonResult, BaseVT)) {
+    // If we have a comparison, we can get the original types to work on instead
----------------
dmgreen wrote:
> lawben wrote:
> > dmgreen wrote:
> > > Could we use computeNumSignBits to check if all the bits will be the same? It might not work very well at this point in the lowering, but llvm has fairly decent support for NumSignBits.
> > > 
> > > Or always include the setcc and optimize it away if it is not required, which we might know more clearly after everything has been legalized.
> > I looked into `ComputeNumSignBits` a bit but I am turning in circles. We only have a `<n x i1>` type here, so we get 1 here, which is not helpful. We need the "original" vector type/size for this to work, which we only have if we traverse through the logical ops and get to the `SETCC`. If there are any other helper functions that can do this, please let me know. I did not find any, but there are so many, I might have missed it.
> > 
> > I've tried to always add the comparison, but it does not get removed again. So I'd need to add some logic that combines arbitrary `SETCC`s with boolean operators in-between. I'd like to avoid that, because it seems (way) out of scope for the change I'm proposing here. 
> OK I was worried that might not work. How sure are we that vxi1 types will always become all-one or zero laned vectors? It feels like we might be hoping they will become them, as opposed to proving it is true (and then adding an optimization where that isn't, at least for the bitcast).
With this "chain" check, we can guarantee that all bits are 1s or 0s as the chain _must_ start with `SETCC`. Adding `AND/OR/XOR`s with other vectors that _must_ start with `SETCC` does not change that property. If we don't find this chain (either is is not there or too long), we defensively add a check for `!= 0`, which again guarantees that all bits are 1s or 0s. So after both the `if` and `else` branch, we know that all bits are 1s or 0s. I've added another test that checks if this check is added for a "bad" chain.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19520-19523
+    ComparisonResult = DAG.getSetCC(
+        DL, VecVT, ComparisonResult,
+        DAG.getSplatBuildVector(VecVT, DL, DAG.getConstant(0, DL, MVT::i64)),
+        ISD::CondCode::SETNE);
----------------
dmgreen wrote:
> lawben wrote:
> > Sp00ph wrote:
> > > Does this always produce the same result as the current implementation? If in general a trunc store would get lowered to something like this:
> > > ```
> > > %trunc = trunc <N x i8> %vec to <N x i1>
> > > store <N x i1> %trunc, ptr %dst
> > > ```
> > > then I think the result would differ if e.g. the input was `<1 x i8> <i8 2>` (current implementation would store a 0, this PR would store a 1). This should be easily fixable though by doing `vec & splat(1) != splat(0)` instead of just `vec != splat(0)`.
> > Actually, I don't think we need it here. For us to get here, we either go through:
> > 
> > a) `replaceBoolVectorBitcast`, which guarantees that the target type is `<n x i1>`
> > b) `combineVectorCompareAndTruncateStore`, which explicitly does what you mentioned. 
> > 
> > In a), we know that the type is a vector of `i1`s, so there must have been some operation to generate/truncate that. If it is `truncate`d, the code for `AND 1` is added in the truncate.
> > 
> > I can still add this defensively, but I think we don't need it. What do you think? 
> What happens if the isChainOfComparesAndLogicalOps path is removed? I'm not sure all the tests would still look OK.
I'm not saying that the `isChainOf` check should be removed. I was arguing for for not needing the `AND 1` truncation. I've decided to add it defensively anyway to avoid producing wrong results. It has no impact on the current 19 new test, as it is either created here or somewhere else during lowering, so there is no additional cost here in all cases I've looked at.

This also allows me to remove the somewhat duplicated logic in `combineVectorCompareAndTruncateStore` (see follow-up patch), so this is probably the cleaner way anyways.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145301/new/

https://reviews.llvm.org/D145301