[PATCH] AARCH64_BE load/store rules fix for ARM ABI

Fri Mar 14 06:35:57 PDT 2014

Hi Jiangning,

Sorry for the barrage of messages today, but...

> And it has the exact same problem: without tracking the history of all
> values (whether they came from ld1 or ldr) we can't know whether that
> final UMOV should refer to lane 0 or lane 3. Unless we make absolutely
> certain that the in-register layout is the same regardless of where a
> vector comes from.

Another evil example has just occurred to me. What code should this
generate to ensure the lowest-addressed element of the pointer is
returned (assuming no IR-level optimisations are performed, so CFG
must remain intact)?

define i16 @foo(i1 %tst, <4 x i16>* %arr_ptr, <4 x i16>* %vec_ptr) {
  br i1 %tst, label %load_vec, label %load_arr

load_vec:
  %vec = load <4 x i16>* %vec_ptr, align 8
  br label %end

load_arr:
  %arr = load <4 x i16>* %arr_ptr, align 2
  br label %end

end:
  %val = phi <4 x i16> [%vec, %load_vec], [%arr, %load_arr]
  %elt = extractelement <4 x i16> %val, i32 0
  ret i16 %elt
}

Cheers.

Tim.

P.S. I argue (modulo bugs):

foo:
     cbz w0, .Lload_arr
.Lload_vec:
     ldr d0, [x2]
     b .Lend
.Lload_arr:
    ld1 { v0.4h }, [x1]
    rev16 v0.8b, v0.8b
.Lend:
    umov w0, v0.h[3]
    ret

With two possible viable alternatives:
1. Drop rev16. Only ever use ld1. umov can use lane [0].
2. Attach rev16 to ldr instead of ld1. umov can use lane [0].