[PATCH] [AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode
Asiri Rathnayake
asiri.rathnayake at arm.com
Fri Aug 29 14:01:19 PDT 2014
Hi Tim,
> I'm afraid I still think that this isn't fixing the real issue, which is the dodgy casting that goes on *after* resolveBuildVector. Putting the change there seems like it would be trying to make two bugs cancel out into a feature (though I can't find any currently incorrect code that results).
I had an itching about this and was trying to come up with a counter-example for a couple of days to prove that this fix was bit too narrow. But I finally gave in (after trying to break resolveBUILD_VECTOR with various ModImmTypes) and thought this is something we have to live with given how we handle vectors in big-endian mode.
>
> - It makes the resolveBuildVector's output incompatible with its fundamental building-block, isConstantSplat.
> - It means that CnstBits no longer represent the in-register value directly: "CnstBits & 1" no longer checks if the value would be even, for example.
> - We now end up passing a big-endian value into endian-agnostic isAdvSIMDModImmTypeN functions.
>
> We bodge that last point up with some REVs afterwards, but that only works in the direct MOVI case, I think.
AFAICS the REV instructions are redundant (in the MOVI case). My understanding was that we have to perform the lane reversal within resolveBuildVector() because at the callee end (when a vector is passed to a function) we get something like:
```
define i16 @f(<4 x i16> %arg) nounwind {
; CHECK: rev64 v0.4h, v0.4h
; CHECK-NEXT umov w0, v0.h[0]
; CHECK-NEXT ret
%v = extractelement <4 x i16> %arg, i32 0
ret i16 %v
}
```
So, to cater for that rev64, we keep the vector lane-reversed. Perhaps a proper fix will have to get rid of that rev64.
Admittedly, my understanding of the problem was a bit narrow at the time so my focus was on somehow resurrecting the existing patch :)
> For example:
>
> @vec = global <4 x i16> <i16 1234, i16 5678, i16 9101, i16 1121>
>
> define i16 @foo() {
> %in = load <4 x i16>* @vec
> %res_vec = and <4 x i16> %in, <i16 65400, i16 65535, i16 65400, i16 65535>
> %elt = extractelement <4 x i16> %res_vec, i16 0
> ret i16 %elt
> }
>
> This currently produces:
>
> ld1 { v0.4h }, [x8]
> bic v0.2s, #0x87
> rev32 v0.4h, v0.4h
> umov w0, v0.h[0]
>
> where that REV32 is highly suspect. But after your patch it produces:
>
> ld1 { v0.4h }, [x8]
> bic v0.2s, #0x87, lsl #16
> rev32 v0.4h, v0.4h
> umov w0, v0.h[0]
>
> where both the BIC and the REV32 are wrong, and don't cancel each other out.
Thanks for this example. This should be enough for me to look for a different fix.
Cheers!
Asiri.
http://reviews.llvm.org/D5097
More information about the llvm-commits
mailing list