[PATCH] D145301: Add more efficient vector bitcast for AArch64

Sat Mar 25 08:56:17 PDT 2023

lawben added a comment.

In D145301#4221490 <https://reviews.llvm.org/D145301#4221490>, @Sp00ph wrote:

> In D145301#4221481 <https://reviews.llvm.org/D145301#4221481>, @lawben wrote:
>
>> If we have a comparison, we know that all bits are 1 or all bits a 0, so if the least significant one is equal to all others.
>
> Aren't the elements in a `<N x i1>` guaranteed to be 0 or -1 (so all zeros or all ones) anyways? And even if there was always an extra instruction emitted so that for compare + bitcast the flow would look like this: `<initial compare> -> <compare returned bitmask> -> <use and-trick on the result of that>`, I would assume that LLVM would just trivially optimize out the second compare if it knows that the result of the first compare already contains all zeros/all ones.

Thats a fair point and might actually be a cleaner solution, given that two consecutive comparisons are actually "merged". I've been looking at this primarily from the Clang/C++ side to optimize the `__builtin_convertvector()` function, which always adds the comparison. I did not know that `<n x i1>` guarantees that all bits are 0 or 1 if the physical type is larger than `i1`.

I'll have a look into this next week. I'm not 100% sure yet where this optimization would need to be located (maybe in `LowerBITCAST` or some bitcast combine). I played around with a few options when writing this code, and depending on where in the optimization I was, the vector type was different, as the `<n x i1>` vector is not a legal type that get's promoted. It may be a bit tricky to find the correct time to detect the bitcast from `<n x i1>`. If you have some suggestions/ideas where this could be done, feel free to share. Otherwise, I'll just dig around a bit.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145301/new/

https://reviews.llvm.org/D145301