[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

Mon Jan 11 02:21:15 PST 2021

While debugging an OOT issue with masked memory intrinsics I came across lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the following form are introduced

%scalar_mask = bitcast <8 x i1> %interleaved.mask to i8

That is when emulating masked stores on machine that is lacking hardware support the <8 x i1> mask vector is bitcasted to a i8 scalar type. Now the problem is that this appears to yield different results for big-endian and little-endian targets.

AFIK in general LLVM IR vectors are laid out in memory with the first element at the lowest address (i.e. independent of endianness) but for the i1 type (and possibly all sub-byte sized types) there seem to be a dependence on target endianness.

For example

define i8 @foo() {
entry:
  %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0
  %bc = bitcast <8 x i1> %v to i8
  ret i8 %bc
}

$ llc -O3 bitcast.ll --mtriple arm -o -     # lsb is set in scalar
$ llc -O3 bitcast.ll --mtriple armeb -o -     # msb is set in scalar

with similar results for mips (big-endian) and amd64 (little-endian)

Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since the mask gets reversed for big-endian targets. I tried addressing this by compensating for endianness when, later in the pass, checking the individual bits of the scalar. This compensation seemed to work well for our big-endian target but rather surprisingly (to me) ARM specific lit-tests then started failing

Failed Tests (3):
  LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll
  LLVM :: CodeGen/Thumb2/mve-masked-load.ll
  LLVM :: CodeGen/Thumb2/mve-masked-store.ll

This leaves me with several questions:

1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the result supposed to be dependent on target endianness?
2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets?
3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then aren't the three lit-tests also broken since they brake when I try to fix the alleged brokenness of ScalarizeMaskedMemIntrin.cpp?

Best regards,
-Markus