[PATCHSET][ARM64] Implement big endian NEON

Thu May 1 07:43:38 PDT 2014

Hi James,

Thanks for working on this.

> 1.       Creation of a targetlowering hook to control if bitcasts can be
> elided, and wiring of this up for ARM64.

I don't think this patch is needed at all. It only affects
load/bitcast & bitcast/store pairs and the elision is correct there by
definition of bitcast (i.e. bitcast == store/bitcast*/load in the
language reference).

> 2.       Fixing of big endian argument passing (non-NEON specific).

I'm unconvinced by the byval changes here. I don't think Clang uses
byval on ARM64 so they should have no effect on ABI compatibility, but
consider this function:

    define i32 @foo([8 x i64], { i32 }* byval %foo) {
      %eltptr = getelementptr { i32 }* %foo, i32 0, i32 0
      %val = load i32* %eltptr
      ret i32 %val
    }

I think that with a natural reading of the AAPCS in LLVM terms, the
value returned should be loaded from [sp] (via clause C.13).

> 3.       [trivial] disconnection of bitcast combine in ISelLowering

I'll be particularly interested to see the test for this one. It feels
like there ought to be some simple commutation relationship between
bitcast & extract, but I can't see it.

> 4.       [mechanical] Predication of LDR/STR (on vectors) to be
> little-endian-only.

This looks fine modulo tests.

> 5.       The Big One ™. Implementation of bitcast for ARM64BE. A good
> explanation is available in the patch commit log.

As you mentioned, the EXT patterns are iffy (the immediate should be a
byte index, not a bit index).

> 6.       Implementation of call/ret lowering and formal arguments lowering.

+// If VT is a 64-bit vector, return f64. If VT is a 128-bit vector,
return f128.
+// If VT is neither of these, return VT unchanged.

Doxygen comments here couldn't hurt.

+        ArgValue = DAG.getExtLoad(ISD::EXTLOAD, DL, LoadVT, Chain, FIN,
+                                  MachinePointerInfo::getFixedStack(FI),
+                                  VA.getLocVT(),
+                                  false, false, false, 0);

Do we ever get an EXTLOAD of a vector type like this during
call/return? I hope not, and have lots more worries about the code if
we do.

I think this patch as a whole might be simplified if we drag
ARM64CallingConvention.td into the fray. Some rule like

    CCIfBigEndian<CCIfType<[...], CCBitConvertToType<f128>>>

near the top (you'd have to define CCIfBigEndian using plain CCIf, I
think; you could get away without it semantically, but the LE DAGs
would be nastier).

After that you could make the BITCASTs unconditional (trivial ones get
elided immediately: SelectionDAG.cpp:2750).

> The only thing missing at the moment except the testcases is fast-isel
> support. That’ll come in a later commit.

You might want to check variadic functions too. va_arg has a
reasonable chance of being correct without any changes (it's
implemented in Clang for AAPCS), but saveVarArgRegisters looks wrong
to me: "DAG.getCopyFromReg(..., MVT::v2i64)".

Cheers.

Tim.