[PATCH] D76727: [VectorCombine] transform bitcasted shuffle to narrower elements

Wed Mar 25 05:54:50 PDT 2020

spatel added a comment.

In D76727#1940722 <https://reviews.llvm.org/D76727#1940722>, @lebedev.ri wrote:

> From reading through the `getCastInstrCost()`'s i don't think any backend
>  currently models it, but there's this comment in AArch64ISelLowering.cpp
>
>   namespace llvm {
>  
>   namespace AArch64ISD {
>  
>   enum NodeType : unsigned {
>   <...>
>     /// Natural vector cast. ISD::BITCAST is not natural in the big-endian
>     /// world w.r.t vectors; which causes additional REV instructions to be
>     /// generated to compensate for the byte-swapping. But sometimes we do
>     /// need to re-interpret the data in SIMD vector registers in big-endian
>     /// mode without emitting such REV instructions.
>     NVCAST,
>
>
> which is consistent with https://reviews.llvm.org/D40633#inline-355090 by @efriedma:
>
> > On some targets, vector bitcasts aren't free (IIRC big-endian ARM is like this).

I agree that bitcasts may not be free, but I don't see how that affects the cost calc for this transform.

I'm open to ideas on how to improve this, but I'm not sure how to proceed without some concrete examples:

1. This transform is too narrow to effectively cost model in isolation? Ie, we need to pattern match something bigger than just cast+shuf.
2. Implement a generic DAGCombine version of x86's canWidenShuffleElements() to allow targets to reverse this?
3. Limit this transform to targets where the bitcast is free (and potentially improve the base cost model to account for big-endian)?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76727/new/

https://reviews.llvm.org/D76727