[PATCH] D145301: Add more efficient vector bitcast for AArch64

Mon Apr 3 11:24:05 PDT 2023

lawben added a comment.

@dmgreen I can split this into two patches. I'll remove the truncate store part and only focus on the bitcast for now.

Re: tests. I'm not sure what the "LLVM-way" to proceed is. For example in

- `dag-combine-setcc.ll` there are explicit tests to cover the "old" expanded case. I guess I can just delete those, as I have new tests somewhere else? Or should I also change them there.
- in `vec_umulo.ll`, there are also cases with this old pattern, but they are interleaved with other logic that I am not familiar with. I can try to change them to the best of my knowledge. Would it make sense to add the original author of that test to review this patch?

@Sp00ph there seems to be a way to get this meta-information about all bits being -1 or 0. See my inline comment about `computeNumSignBits`. Unfortunately, it does not seem to work well here.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19507
+  EVT BaseVT;
+  if (isChainOfComparesAndLogicalOps(ComparisonResult, BaseVT)) {
+    // If we have a comparison, we can get the original types to work on instead
----------------
dmgreen wrote:
> Could we use computeNumSignBits to check if all the bits will be the same? It might not work very well at this point in the lowering, but llvm has fairly decent support for NumSignBits.
> 
> Or always include the setcc and optimize it away if it is not required, which we might know more clearly after everything has been legalized.
I looked into `ComputeNumSignBits` a bit but I am turning in circles. We only have a `<n x i1>` type here, so we get 1 here, which is not helpful. We need the "original" vector type/size for this to work, which we only have if we traverse through the logical ops and get to the `SETCC`. If there are any other helper functions that can do this, please let me know. I did not find any, but there are so many, I might have missed it.

I've tried to always add the comparison, but it does not get removed again. So I'd need to add some logic that combines arbitrary `SETCC`s with boolean operators in-between. I'd like to avoid that, because it seems (way) out of scope for the change I'm proposing here. 

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19520-19523
+    ComparisonResult = DAG.getSetCC(
+        DL, VecVT, ComparisonResult,
+        DAG.getSplatBuildVector(VecVT, DL, DAG.getConstant(0, DL, MVT::i64)),
+        ISD::CondCode::SETNE);
----------------
Sp00ph wrote:
> Does this always produce the same result as the current implementation? If in general a trunc store would get lowered to something like this:
> ```
> %trunc = trunc <N x i8> %vec to <N x i1>
> store <N x i1> %trunc, ptr %dst
> ```
> then I think the result would differ if e.g. the input was `<1 x i8> <i8 2>` (current implementation would store a 0, this PR would store a 1). This should be easily fixable though by doing `vec & splat(1) != splat(0)` instead of just `vec != splat(0)`.
Actually, I don't think we need it here. For us to get here, we either go through:

a) `replaceBoolVectorBitcast`, which guarantees that the target type is `<n x i1>`
b) `combineVectorCompareAndTruncateStore`, which explicitly does what you mentioned. 

In a), we know that the type is a vector of `i1`s, so there must have been some operation to generate/truncate that. If it is `truncate`d, the code for `AND 1` is added in the truncate.

I can still add this defensively, but I think we don't need it. What do you think? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145301/new/

https://reviews.llvm.org/D145301