[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Tue Mar 4 03:38:11 PST 2014

Hi Albrecht,

On 4 March 2014 10:19, Albrecht Kadlec <akadlec at a-bix.com> wrote:
> For the first part: I still want that here - at the start of the LDR/STR
> rules, because it is the reason these rules have been added for BE and LE.
> It's non-obvious, why NEON register loads are handled along with normal
> loads.

Well, fair enough.

> Could we move back to Phabricator for the comments on the "final few" diffs
> ?

Yep, sounds good to me.

> Sorry to add to your pain: Then we'd logically have to remove the IsBE
> predicate: "Don't add what you can't test". We have no other way of ensuring
> that the code paths stay the same for LE & BE - and that adding a rule for
> LE doesn't kill BE.

I'm actually much more worried about people introducing unwarranted
similarities between BE and LE than the reverse, which carpet-bombing
the tests with an identical aarch64_be run line is not going to guard
against. In fact it would be more likely cement any unjustified
symmetry.

Anyone explicitly introducing some dependence on IsBE or IsLE will be
expected to provide tests for that functionality in both directions.

> Unfortunately there's no concept of lower priority weekly tests, that I know
> of. (asan would like that too - Hi Kostya!)

You could try separate files with "REQUIRES: long_tests". I think one
or two people might actually run them, but not many. Another
reasonably common approach would be to *change* some of the existing
aarch64 tests to aarch64_be in areas that are almost certainly common
(this is how linux & iOS tend to share the 32-bit ARM tests, for
example).

> So I'm fine with removing the identical tests for BE, if you tell me that BE
> is not such an important target that it has to be guarded against
> unintentional changes.

I think it's not different enough to warrant doubling the test load
for. Similar arguments could be made for PIC vs static, large/small
code models, NEON/no-NEON or any number of other options that affect
how the backend behaves (we're up to "* 16" already).

But the reality is that they have focussed tests on the areas they're
expected to affect. Testing whether we can add two integers
successfully in PIC mode or make use of "fmla" when the code is
destined for the JIT just isn't productive.

The blanket approach has proven itself ineffective already, before
even being committed, with:

> neon-copy.ll is one of those that we need to investigate. Why would you say
> that it's wrong - it's just hard-coded to LE ?

Yes, take the first one for example:

define <16 x i8> @ins16bw(<16 x i8> %tmp1, i8 %tmp2) {
;CHECK: ins {{v[0-9]+}}.b[15], {{w[0-9]+}}
  %tmp3 = insertelement <16 x i8> %tmp1, i8 %tmp2, i32 15
  ret <16 x i8> %tmp3
}

On big-endian I believe that should produce "ins v0.b[0], w0" because
that "i32 15" refers to the element of the vector with the highest
address (if it was stored). The rest are the same, as well as most
(but probably not all) other uses of indexes in the tests.

> 1) remove IsLE guards for lane-loads - also for any parts of LD1 ???

The "normal" ld1s are probably fine as you have them. I wouldn't worry
about being selective there for now, since mostly ldr does the same
job.

> 2) remove AARCH64_BE test RUNs that are equivalent to LE, because BE isn't
> important enough to be tested all the time.

I'd be reasonably happy for you to change a few from LE to BE (and
just add BE in some important edge cases) *after* careful thought. But
it's a separate issue to this patch, so perhaps it would be best to
discuss it in a separate thread.

This patch specifically tweaks some load/store functionality, so
that's what it should be testing.

> 3) anything else from another thread I forgot ?

Not that I can think of.

Cheers.

Tim.