[PATCH] D145301: Add more efficient vector bitcast for AArch64

Wed Apr 5 01:33:41 PDT 2023

dmgreen added a comment.

In D145301#4241206 <https://reviews.llvm.org/D145301#4241206>, @lawben wrote:

> @dmgreen I can split this into two patches. I'll remove the truncate store part and only focus on the bitcast for now.
>
> Re: tests. I'm not sure what the "LLVM-way" to proceed is. For example in
>
> - `dag-combine-setcc.ll` there are explicit tests to cover the "old" expanded case. I guess I can just delete those, as I have new tests somewhere else? Or should I also change them there.
> - in `vec_umulo.ll`, there are also cases with this old pattern, but they are interleaved with other logic that I am not familiar with. I can try to change them to the best of my knowledge. Would it make sense to add the original author of that test to review this patch?

I think you can just update the test checks in them (with update_llc_test_checks.py). So long as the code looks better (smaller, but still correct), all is good.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19507
+  EVT BaseVT;
+  if (isChainOfComparesAndLogicalOps(ComparisonResult, BaseVT)) {
+    // If we have a comparison, we can get the original types to work on instead
----------------
lawben wrote:
> dmgreen wrote:
> > Could we use computeNumSignBits to check if all the bits will be the same? It might not work very well at this point in the lowering, but llvm has fairly decent support for NumSignBits.
> > 
> > Or always include the setcc and optimize it away if it is not required, which we might know more clearly after everything has been legalized.
> I looked into `ComputeNumSignBits` a bit but I am turning in circles. We only have a `<n x i1>` type here, so we get 1 here, which is not helpful. We need the "original" vector type/size for this to work, which we only have if we traverse through the logical ops and get to the `SETCC`. If there are any other helper functions that can do this, please let me know. I did not find any, but there are so many, I might have missed it.
> 
> I've tried to always add the comparison, but it does not get removed again. So I'd need to add some logic that combines arbitrary `SETCC`s with boolean operators in-between. I'd like to avoid that, because it seems (way) out of scope for the change I'm proposing here. 
OK I was worried that might not work. How sure are we that vxi1 types will always become all-one or zero laned vectors? It feels like we might be hoping they will become them, as opposed to proving it is true (and then adding an optimization where that isn't, at least for the bitcast).

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19520-19523
+    ComparisonResult = DAG.getSetCC(
+        DL, VecVT, ComparisonResult,
+        DAG.getSplatBuildVector(VecVT, DL, DAG.getConstant(0, DL, MVT::i64)),
+        ISD::CondCode::SETNE);
----------------
lawben wrote:
> Sp00ph wrote:
> > Does this always produce the same result as the current implementation? If in general a trunc store would get lowered to something like this:
> > ```
> > %trunc = trunc <N x i8> %vec to <N x i1>
> > store <N x i1> %trunc, ptr %dst
> > ```
> > then I think the result would differ if e.g. the input was `<1 x i8> <i8 2>` (current implementation would store a 0, this PR would store a 1). This should be easily fixable though by doing `vec & splat(1) != splat(0)` instead of just `vec != splat(0)`.
> Actually, I don't think we need it here. For us to get here, we either go through:
> 
> a) `replaceBoolVectorBitcast`, which guarantees that the target type is `<n x i1>`
> b) `combineVectorCompareAndTruncateStore`, which explicitly does what you mentioned. 
> 
> In a), we know that the type is a vector of `i1`s, so there must have been some operation to generate/truncate that. If it is `truncate`d, the code for `AND 1` is added in the truncate.
> 
> I can still add this defensively, but I think we don't need it. What do you think? 
What happens if the isChainOfComparesAndLogicalOps path is removed? I'm not sure all the tests would still look OK.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145301/new/

https://reviews.llvm.org/D145301