[llvm-dev] RFC: SROA for method argument

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Tue May 9 10:53:09 PDT 2017


On 5/9/2017 6:05 AM, Hiroshi 7 Inoue via llvm-dev wrote:
>
> Hi,
>
> I am working to improve SROA to generate better code when a method has 
> a struct in its arguments. I would appreciate it if I could have any 
> suggestions or comments on how I can best proceed with this optimization.
>
> * Problem *
> I observed that LLVM often generates redundant instructions around 
> glibc’s istreambuf_iterator. The problem comes from the scalar 
> replacement (SROA) for methods with an aggregate as an argument. Here 
> is a simplified example in C.
>
> struct record {
> long long a;
> int b;
> int c;
> };
>
> int func(struct record r) {
> for (int i = 0; i < r.c; i++)
> r.b++;
> return r.b;
> }
>
> When updating r.b (or r.c as well), SROA generates redundant 
> instructions on some platforms (such as x86_64 and ppc64); here, r.b 
> and r.c are packed into one 64-bit GPR when the struct is passed as a 
> method argument. The problem is caused when the same memory location 
> is accessed by load/store instructions of different types.
> For this example, CLANG generates following IRs to initialize the 
> struct for ppc64 and x86_64. For both platforms, the 64-bit value is 
> stored into memory allocated by alloca first. Later, the same memory 
> location is accessed as 32-bit integer values (r.b and r.c).
>
> for ppc64
> %struct.record = type { i64, i32, i32 }
>
> define signext i32 @ppc64le_func([2 x i64] %r.coerce) #0 {
> entry:
> %r = alloca %struct.record, align 8
> %0 = bitcast %struct.record* %r to [2 x i64]*
> store [2 x i64] %r.coerce, [2 x i64]* %0, align 8
> ....
>
> for x86_64
> define i32 @x86_64_func(i64 %r.coerce0, i64 %r.coerce1) #0 {
> entry:
> %r = alloca %struct.record, align 8
> %0 = bitcast %struct.record* %r to { i64, i64 }*
> %1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i32 0, i32 0
> store i64 %r.coerce0, i64* %1, align 8
> %2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i32 0, i32 1
> store i64 %r.coerce1, i64* %2, align 8
> ....
>
> For such code sequence, the current SROA generates instructions to 
> update only upper (or lower) half of the 64-bit value when storing r.b 
> (or r.c). SROA can split an i64 value into two i32 values under some 
> conditions (e.g. when the struct contains only int b and int c in this 
> example), but it is not capable of splitting complex cases.
>
When there are accesses of mixed type to an alloca, SROA just treats the 
whole alloca as a big integer, and generates PHI nodes appropriately.  
In many cases, instcombine would then slice up the generated PHI nodes 
to use more appropriate types, but that doesn't work out here.  (See 
InstCombiner::SliceUpIllegalIntegerPHI.)  Probably the right solution is 
to make instcombine more aggressive here; it's hard to come up with a 
generally useful transform in SROA without reasoning about control flow.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170509/4eaa0eab/attachment.html>


More information about the llvm-dev mailing list