[PATCH] AArch64: big endian constant vector pools

Sun Apr 13 02:55:22 PDT 2014

>   I fully agree to your assessment and prefer solution 1 as well. Would you please elaborate on how to you imagine implement it?

It wasn't necessarily a recommendation at that stage, just one of the
possible options. At least Christian should probably be involved in
the discussion too. And, looking at the final list of BITCAST uses
there may not be much difference in the eventual complexity either
way.

But, implementing it:

1. Make sure ld1/st1 are always used for big-endian vector loads &
stores rather than ldr/str (stack spills & fills are fine with ldr/str
because they're invisible to user code).
2. Disable existing pure "bitconvert" patterns on big-endian (around
AArch64InstrNEON.td, line 6556) and write another set, where every
non-trivial bitcast maps to a pair of REV instructions.
3. Modify LowerFormalArguments, LowerReturn, LowerCall and
LowerCallResult so that every type enters and exits the boundary as
v16i8/v8i8 and gets immediately BITCAST to the desired type.
4. Go through every *other* use of bitconvert (in .td) and BITCAST (in
.cpp), deciding whether it's OK or implicitly assuming little-endian
behaviour. At a glance (needs checking):
  + AArch64ISelLowering.cpp:2811 appears completely redundant.
  + LowerVectorSELECT_CC looks plausibly OK, but I'd want to make sure
with testing.
  + getVShiftImm looks dodgy. Actually it looks highly suspect even as
a little-endian optimisation.
  + Uses just after creating NEON_MOVIMM & NEON_MVNIMM nodes look
suspect: they seem to want an actual NOP conversion. I'd probably
switch NEON_MOVIMM etc to take an actual type argument if possible
instead of inferring it from the type of the node (i.e. instead of the
current "(v4i16 (bitcast (v2i32 (MOVIMM ...))))" you'd get "(v4i16
(MOVIMM ..., v2i32))"). Fixing this would take care of most of the
suspect .td entries too, and generally make the patterns neater.
  + Code just after AArch64ISelLowering.cpp:4723 looks very likely LE specific.
  + Lots of .td uses act on vectors containing all 0 or all 1, these
are clearly fine.
  + Ones involving movi/mvn mentioned above.
  + Not quite sure about the Neon_combine_2D ones, but the bitcasts to
v1i64 look iffy. At the moment I can't even quite see where they're
*created* though.

Obviously this would all be a gradual process, rather than one big patch.

Cheers.

Tim.