[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

Fri Jan 15 04:03:56 PST 2021

Sorry - Yes. The ARM/MVE tests are correct as-is, in that they produce the correct output under big endian as far as I can tell. (The aligned test not being scalarized produces the same output as the unaligned case that is). When MVE is enabled the backend is assuming that low lanes end up in low bits of the predicate mask. So the two cancel each other out and we happen to end up with the correct code.

Apparently this is different to the rest of llvm, which assumes the opposite for non-byte sized vectors? That is surprising, we even have some instructions under MVE for storing predicates which under big endian assume the low lane is in the low bits. I would not be surprised if this was causing problems somewhere under big endian though, it does not get nearly as much use as little endian.

> @markus: Could you help out locating the functions that we think is wrong in those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to Phabricator to show the differences both to the LLVM code and the new codegen for those test cases?

Yeah, If you can upload a phabricator review for the changes in the expansion of masked intrinsics, I can take a look into the MVE codegen and see if I can get it to store in the opposite order sensibly. I have not looked at what that would take yet, but I'm hoping it's not too difficult.

Thanks,
Dave

From: Björn Pettersson A
Sent: 15 January 2021 11:39
To: llvm-dev <llvm-dev at lists.llvm.org>; David Green; Markus Lavin
Subject: RE: bitcast <8 x i1> to i8 - dependence on endianness? 

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Markus
> Lavin via llvm-dev
> Sent: den 11 januari 2021 11:21
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?
> 
> While debugging an OOT issue with masked memory intrinsics I came across
> lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the
> following form are introduced
> 
> %scalar_mask = bitcast <8 x i1> %interleaved.mask to i8
> 
> That is when emulating masked stores on machine that is lacking hardware
> support the <8 x i1> mask vector is bitcasted to a i8 scalar type. Now
> the problem is that this appears to yield different results for big-
> endian and little-endian targets.
> 
> AFIK in general LLVM IR vectors are laid out in memory with the first
> element at the lowest address (i.e. independent of endianness) but for
> the i1 type (and possibly all sub-byte sized types) there seem to be a
> dependence on target endianness.
> 
> For example
> 
> define i8 @foo() {
> entry:
>   %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0
>   %bc = bitcast <8 x i1> %v to i8
>   ret i8 %bc
> }
> 
> $ llc -O3 bitcast.ll --mtriple arm -o -     # lsb is set in scalar
> $ llc -O3 bitcast.ll --mtriple armeb -o -     # msb is set in scalar
> 
> with similar results for mips (big-endian) and amd64 (little-endian)
> 
> Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since
> the mask gets reversed for big-endian targets. I tried addressing this by
> compensating for endianness when, later in the pass, checking the
> individual bits of the scalar. This compensation seemed to work well for
> our big-endian target but rather surprisingly (to me) ARM specific lit-
> tests then started failing
> 
> Failed Tests (3):
>   LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-load.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-store.ll

It would be nice if someone from ARM could acknowledge that the codegen actually is faulty for big-endian now (all I know is that David Green has done lots of changes to those test cases in the past according to git log, but anyone with mve knowledge could perhaps look at it).

@markus: Could you help out locating the functions that we think is wrong in those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to Phabricator to show the differences both to the LLVM code and the new codegen for those test cases?

> 
> This leaves me with several questions:
> 
> 1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the result
> supposed to be dependent on target endianness?
> 2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets?
> 3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then
> aren't the three lit-tests also broken since they brake when I try to fix
> the alleged brokenness of ScalarizeMaskedMemIntrin.cpp?
> 
> Best regards,
> -Markus
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://protect2.fireeye.com/v1/url?k=8023e26a-dfb8d873-8023a2f1-
> 8682aaa22bc0-a28214bd5e17ca25&q=1&e=7feee9fa-d638-4a7e-a187-
> bff3673adec6&u=https%3A%2F%2Flists.llvm.org%2Fcgi-
> bin%2Fmailman%2Flistinfo%2Fllvm-dev