[PATCH] AARCH64_BE load/store rules fix for ARM ABI
Tim Northover
t.p.northover at gmail.com
Fri Mar 14 06:35:57 PDT 2014
Hi Jiangning,
Sorry for the barrage of messages today, but...
> And it has the exact same problem: without tracking the history of all
> values (whether they came from ld1 or ldr) we can't know whether that
> final UMOV should refer to lane 0 or lane 3. Unless we make absolutely
> certain that the in-register layout is the same regardless of where a
> vector comes from.
Another evil example has just occurred to me. What code should this
generate to ensure the lowest-addressed element of the pointer is
returned (assuming no IR-level optimisations are performed, so CFG
must remain intact)?
define i16 @foo(i1 %tst, <4 x i16>* %arr_ptr, <4 x i16>* %vec_ptr) {
br i1 %tst, label %load_vec, label %load_arr
load_vec:
%vec = load <4 x i16>* %vec_ptr, align 8
br label %end
load_arr:
%arr = load <4 x i16>* %arr_ptr, align 2
br label %end
end:
%val = phi <4 x i16> [%vec, %load_vec], [%arr, %load_arr]
%elt = extractelement <4 x i16> %val, i32 0
ret i16 %elt
}
Cheers.
Tim.
P.S. I argue (modulo bugs):
foo:
cbz w0, .Lload_arr
.Lload_vec:
ldr d0, [x2]
b .Lend
.Lload_arr:
ld1 { v0.4h }, [x1]
rev16 v0.8b, v0.8b
.Lend:
umov w0, v0.h[3]
ret
With two possible viable alternatives:
1. Drop rev16. Only ever use ld1. umov can use lane [0].
2. Attach rev16 to ldr instead of ld1. umov can use lane [0].
More information about the llvm-commits
mailing list