<div dir="ltr"><div>Consider the following small example:<br><br>struct wrapper {<br>    long value;<br>};<br>long read_wrapper(wrapper w) { return w.value; }<br>long read_primitive(long x) { return x; }<br><br>When compiling for x86 at -O1, both functions reduce nicely to a single IR instruction. Looks like -sroa is performing this transformation, but even at -O0 it has deduced that the argument is really just an i64.<br><br>Before -sroa:<br>define dso_local i64 @_Z12read_wrapper7wrapper(i64) #0 {<br>  %2 = alloca %struct.wrapper, align 8<br>  %3 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0<br>  store i64 %0, i64* %3, align 8<br>  %4 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0<br>  %5 = load i64, i64* %4, align 8<br>  ret i64 %5<br>}</div><div><br>After -sroa:<br>define dso_local i64 @_Z12read_wrapper7wrapper(i64 returned) local_unnamed_addr #0 {<br>  ret i64 %0<br>}<br></div><div><br></div><div>But when I add -target le64, the read_wrapper function accepts a %struct.wrapper* byval, a pointer to the caller's stack. No level of optimization is able to make this function look as simple as read_primitive. <br><br>define dso_local i64 @_Z12read_wrapper7wrapper(%struct.wrapper* byval nocapture readonly align 8) local_unnamed_addr #0 {<br>  %2 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %0, i64 0, i32 0<br>  %3 = load i64, i64* %2, align 8, !tbaa !2<br>  ret i64 %3<br>}<br><br>We're writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?<br></div><div><br></div></div>