[llvm-dev] Endianness for multi-word types

Tue Dec 1 01:00:55 PST 2015

----- Original Message -----
> From: "Tim Shen via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Yunzhong Gao" <yunzhong_gao at playstation.sony.com>, llvm-dev at lists.llvm.org
> Sent: Monday, November 30, 2015 10:14:05 PM
> Subject: Re: [llvm-dev] Endianness for multi-word types
> 
> On Mon, Nov 30, 2015 at 7:24 PM Gao, Yunzhong <
> yunzhong_gao at playstation.sony.com > wrote:
> 
> 
> According to
> http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html ,
> "The high-order double-precision value (the one that comes first in
> storage) must have the larger magnitude."
> 
> So the order of the two doubles in your fp128 is not affected by the
> endianness; although each individual double is.
> 
> Well I'm still trying to understand the code base, so I may get this
> completely wrong, but here is my understanding:
> 
> Looking at the comment of TargetLowering::hasBigEndianPartOrdering,
> it seems that, all "internal types" (e.g. ppc_fp128, i128) are
> assumed to be little-endian, and for non-ppc_fp128 external values
> (raw register tuples or memory), the endianness follows the data
> layout endianness; for ppc_fp128 external values (a pair of float
> registers or memory chunk), it's always "big-endian" due to the
> policy.
> 

Right. ppc_fp128 is special in this regard. Its layout is fixed in an Endian-independent way. This can be confusing, however, because it means that casts from i128 <-> ppc_fp128 have an Endianness-dependent result. This requires a number of special cases in various places (unfortunately).

This is why we needed to disable ppc_fp128 constant folding through bitcasts, for example. From IR/ConstantFold.cpp:

  // Handle ConstantFP input: FP -> Integral.
  if (ConstantFP *FP = dyn_cast<ConstantFP>(V)) {
    // PPC_FP128 is really the sum of two consecutive doubles, where the first
    // double is always stored first in memory, regardless of the target
    // endianness. The memory layout of i128, however, depends on the target
    // endianness, and so we can't fold this without target endianness
    // information. This should instead be handled by
    // Analysis/ConstantFolding.cpp
    if (FP->getType()->isPPC_FP128Ty())
      return nullptr;

> The problem is when we bitcast from ppc_fp128 to i128, do we exchange
> the two registers? Today we do, because ppc_fp128 requires a
> mandatory big-endian representation, but at the same time it's a
> i128, which is interpreted as a little-endian 128-bit integer.
> That's why the combiner I previously mentioned gets confused.
> If we don't exchange the two registers, combiners are happy because
> they get a little-endian view of the two doubles (e.g. the most
> significant bit is indeed the sign bit); but if someone stores this
> casted i128 to memory, the two words are in little-endian, which
> breaks the ppc_fp128 layout policy. It also breaks the bitcast
> semantic: "The conversion is done as if the value had been stored to
> memory and read back as type ty2."

If that's what's happening, I agree, that seems broken. The ppc_fp128 <-> i128 bitcast should legitimately have different results on big-Endian and little-Endian systems. Swapping the registers when we do a bitcast in some attempt to remove this effect is not self consistent. However, we seem to get this right currently:

$ cat /tmp/ld.c 
long foo(long double x) {
  return *(long *) &x;
  // returns f1 on both powerpc64 and powerpc64le
}

long bar(long double x) {
  __int128 y = *(__int128 *) &x;
  return (long) y;
  // returns f2 on powerpc64 and f1 on powerpc64le
}

Given that the higher-order double is defined to be the one that comes first in storage, in the first case, foo(), regardless of Endianness, should return the higher-order double as an integer. In the second case, bar(), we're returning the lower-order (integer) part of the i128 overlaying the ppc_fp128 value. On a little-Endian system, this should be the first double stored in memory, as an integer. That's the higher-order double, and so the code should be the same as for foo(). For a big-Endian system, the lower-order part of the i128 integer is stored in the second double (in layout), and so it should return the lower-order double, as an integer. This is the case that should differ from foo(), but only on big-Endian systems.

Luckily, this is what happens (and what GCC does as well).

> I did some investigation, it turns out in
> lib/CodeGen/SelectionDAG/DAGCombiner.cpp, a combination is wrongly
> assuming the endianness for the i128 bitcasted from a ppc_fp128 (two
> doubles bit-concatenated together):
> // fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit))
> 
> This reveals that endianness conversion concerning ppc_fp128 or i128,
> and big-endian or little-endian target is too confusing to me.
> 
> I can certainly fix this case by using a "smarter signbit" (e.g.
> 0x00000000000000001000000000000000), but this seems more confusing.
> I wonder if anyone has the best background knowledge here?

And, unfortunately, this is exactly what you should do. ppc_fp128 is a special case here (again).

 -Hal

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory