[PATCH] [AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode

Fri Aug 29 14:01:19 PDT 2014

Hi Tim,

> I'm afraid I still think that this isn't fixing the real issue, which is the dodgy casting that goes on *after* resolveBuildVector. Putting the change there seems like it would be trying to make two bugs cancel out into a feature (though I can't find any currently incorrect code that results).

I had an itching about this and was trying to come up with a counter-example for a couple of days to prove that this fix was bit too narrow. But I finally gave in (after trying to break resolveBUILD_VECTOR with various ModImmTypes) and thought this is something we have to live with given how we handle vectors in big-endian mode.

> 
>   - It makes the resolveBuildVector's output incompatible with its fundamental building-block, isConstantSplat.
>   - It means that CnstBits no longer represent the in-register value directly: "CnstBits & 1" no longer checks if the value would be even, for example.
>   - We now end up passing a big-endian value into endian-agnostic isAdvSIMDModImmTypeN functions.
> 
> We bodge that last point up with some REVs afterwards, but that only works in the direct MOVI case, I think.

AFAICS the REV instructions are redundant (in the MOVI case). My understanding was that we have to perform the lane reversal within resolveBuildVector() because at the callee end (when a vector is passed to a function) we get something like:

```
define i16 @f(<4 x i16> %arg) nounwind {
  ; CHECK:          rev64   v0.4h, v0.4h
  ; CHECK-NEXT      umov    w0, v0.h[0]
  ; CHECK-NEXT      ret
  %v = extractelement <4 x i16> %arg, i32 0
  ret i16 %v
}
```
So, to cater for that rev64, we keep the vector lane-reversed. Perhaps a proper fix will have to get rid of that rev64.

Admittedly, my understanding of the problem was a bit narrow at the time so my focus was on somehow resurrecting the existing patch :)

> For example:
> 
>     @vec = global <4 x i16> <i16 1234, i16 5678, i16 9101, i16 1121>
> 
>     define i16 @foo() {
>       %in = load <4 x i16>* @vec
>       %res_vec = and <4 x i16> %in, <i16 65400, i16 65535, i16 65400, i16 65535>
>       %elt = extractelement <4 x i16> %res_vec, i16 0
>       ret i16 %elt
>     }
> 
> This currently produces:
> 
>         ld1     { v0.4h }, [x8]
>         bic     v0.2s, #0x87
>         rev32   v0.4h, v0.4h
>         umov    w0, v0.h[0]
> 
> where that REV32 is highly suspect. But after your patch it produces:
> 
>         ld1     { v0.4h }, [x8]
>         bic     v0.2s, #0x87, lsl #16
>         rev32   v0.4h, v0.4h
>         umov    w0, v0.h[0]
> 
> where both the BIC and the REV32 are wrong, and don't cancel each other out.

Thanks for this example. This should be enough for me to look for a different fix.

Cheers!

Asiri.

http://reviews.llvm.org/D5097