[llvm-dev] Endianness for multi-word types

Tue Dec 1 11:52:13 PST 2015

> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Tuesday, December 01, 2015 1:01 AM
> To: Tim Shen
> Cc: Gao, Yunzhong; llvm-dev at lists.llvm.org; Kit Barton; Nemanja Ivanovic
> Subject: Re: [llvm-dev] Endianness for multi-word types
> 
> ----- Original Message -----
> > From: "Tim Shen via llvm-dev" <llvm-dev at lists.llvm.org>
> > To: "Yunzhong Gao" <yunzhong_gao at playstation.sony.com>,
> > llvm-dev at lists.llvm.org
> > Sent: Monday, November 30, 2015 10:14:05 PM
> > Subject: Re: [llvm-dev] Endianness for multi-word types
> >
> > On Mon, Nov 30, 2015 at 7:24 PM Gao, Yunzhong <
> > yunzhong_gao at playstation.sony.com > wrote:
> >
> >
> > According to
> > http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html , "The
> > high-order double-precision value (the one that comes first in
> > storage) must have the larger magnitude."
> >
> > So the order of the two doubles in your fp128 is not affected by the
> > endianness; although each individual double is.
> >
> > Well I'm still trying to understand the code base, so I may get this
> > completely wrong, but here is my understanding:
> >
> > Looking at the comment of TargetLowering::hasBigEndianPartOrdering,
> > it seems that, all "internal types" (e.g. ppc_fp128, i128) are assumed
> > to be little-endian, and for non-ppc_fp128 external values (raw
> > register tuples or memory), the endianness follows the data layout
> > endianness; for ppc_fp128 external values (a pair of float registers
> > or memory chunk), it's always "big-endian" due to the policy.
> >
> 
> Right. ppc_fp128 is special in this regard. Its layout is fixed in an Endian-
> independent way. This can be confusing, however, because it means that
> casts from i128 <-> ppc_fp128 have an Endianness-dependent result. This
> requires a number of special cases in various places (unfortunately).
> 
> This is why we needed to disable ppc_fp128 constant folding through bitcasts,
> for example. From IR/ConstantFold.cpp:
> 
>   // Handle ConstantFP input: FP -> Integral.
>   if (ConstantFP *FP = dyn_cast<ConstantFP>(V)) {
>     // PPC_FP128 is really the sum of two consecutive doubles, where the first
>     // double is always stored first in memory, regardless of the target
>     // endianness. The memory layout of i128, however, depends on the
> target
>     // endianness, and so we can't fold this without target endianness
>     // information. This should instead be handled by
>     // Analysis/ConstantFolding.cpp
>     if (FP->getType()->isPPC_FP128Ty())
>       return nullptr;
> 
> > The problem is when we bitcast from ppc_fp128 to i128, do we exchange
> > the two registers? Today we do, because ppc_fp128 requires a mandatory
> > big-endian representation, but at the same time it's a i128, which is
> > interpreted as a little-endian 128-bit integer.
> > That's why the combiner I previously mentioned gets confused.
> > If we don't exchange the two registers, combiners are happy because
> > they get a little-endian view of the two doubles (e.g. the most
> > significant bit is indeed the sign bit); but if someone stores this
> > casted i128 to memory, the two words are in little-endian, which
> > breaks the ppc_fp128 layout policy. It also breaks the bitcast
> > semantic: "The conversion is done as if the value had been stored to
> > memory and read back as type ty2."
> 
> If that's what's happening, I agree, that seems broken. The ppc_fp128 <->
> i128 bitcast should legitimately have different results on big-Endian and little-
> Endian systems. Swapping the registers when we do a bitcast in some
> attempt to remove this effect is not self consistent. However, we seem to
> get this right currently:
> 
> $ cat /tmp/ld.c
> long foo(long double x) {
>   return *(long *) &x;
>   // returns f1 on both powerpc64 and powerpc64le }
> 
> long bar(long double x) {
>   __int128 y = *(__int128 *) &x;
>   return (long) y;
>   // returns f2 on powerpc64 and f1 on powerpc64le }
> 
> Given that the higher-order double is defined to be the one that comes first
> in storage, in the first case, foo(), regardless of Endianness, should return the
> higher-order double as an integer. In the second case, bar(), we're returning
> the lower-order (integer) part of the i128 overlaying the ppc_fp128 value. On
> a little-Endian system, this should be the first double stored in memory, as an
> integer. That's the higher-order double, and so the code should be the same
> as for foo(). For a big-Endian system, the lower-order part of the i128 integer
> is stored in the second double (in layout), and so it should return the lower-
> order double, as an integer. This is the case that should differ from foo(), but
> only on big-Endian systems.
> 
> Luckily, this is what happens (and what GCC does as well).

What Hal said is correct, when bitcasting from ppc_fp128 to i128, you need to exchange
the two registers, otherwise you will break the test case in Hal's email.

> > I did some investigation, it turns out in
> > lib/CodeGen/SelectionDAG/DAGCombiner.cpp, a combination is wrongly
> > assuming the endianness for the i128 bitcasted from a ppc_fp128 (two
> > doubles bit-concatenated together):
> > // fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit))
> >
> > This reveals that endianness conversion concerning ppc_fp128 or i128,
> > and big-endian or little-endian target is too confusing to me.
> >
> > I can certainly fix this case by using a "smarter signbit" (e.g.
> > 0x00000000000000001000000000000000), but this seems more confusing.
> > I wonder if anyone has the best background knowledge here?
> 
> And, unfortunately, this is exactly what you should do. ppc_fp128 is a special
> case here (again).

Actually, I think the smart signbit here is not correct. For the fneg case, you need to
toggle the sign bits of both components of a ppc_fp128: negate(d1 + d2) should
produce (-d1 - d2) instead of (-d1 + d2). It probably indicates that this pass is wrong
for both big-endian and little-endian ppc64 targets, but existing test cases were not strong
enough to catch such cases. For the fabs case, you can test positivity using only one
signbit, but to negate the number, you still need to toggle both sign bits.

I have a feeling that you will run into more cases in the DAG combiner where you want
to add special handling codes for ppc_fp128, and I also feel that this type is peculiar
enough that other backend owners upon trying to add more DAG combiner passes are
likely to miss adding the special handling codes as they probably should. In that case it
would be a better idea to handle the ppc_fp128 DAG combiner cases somewhere under
llvm::PPCTargetLowering::PerformDAGCombine() instead of in the target-independent
DAGCombiner.cpp,
- Gao

> 
>  -Hal
> 
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory