[llvm-dev] RFC: SROA for method argument

Tue May 9 06:05:48 PDT 2017

Hi,

I am working to improve SROA to generate better code when a method has a
struct in its arguments. I would appreciate it if I could have any
suggestions or comments on how I can best proceed with this optimization.

* Problem *
I observed that LLVM often generates redundant instructions around glibc’s
istreambuf_iterator. The problem comes from the scalar replacement (SROA)
for methods with an aggregate as an argument. Here is a simplified example
in C.

struct record {
    long long a;
    int b;
    int c;
};

int func(struct record r) {
    for (int i = 0; i < r.c; i++)
        r.b++;
    return r.b;
}

When updating r.b (or r.c as well), SROA generates redundant instructions
on some platforms (such as x86_64 and ppc64); here, r.b and r.c are packed
into one 64-bit GPR when the struct is passed as a method argument. The
problem is caused when the same memory location is accessed by load/store
instructions of different types.
For this example, CLANG generates following IRs to initialize the struct
for ppc64 and x86_64. For both platforms, the 64-bit value is stored into
memory allocated by alloca first. Later, the same memory location is
accessed as 32-bit integer values (r.b and r.c).

for ppc64
    %struct.record = type { i64, i32, i32 }

    define signext i32 @ppc64le_func([2 x i64] %r.coerce) #0 {
    entry:
      %r = alloca %struct.record, align 8
      %0 = bitcast %struct.record* %r to [2 x i64]*
      store [2 x i64] %r.coerce, [2 x i64]* %0, align 8
      ....

for x86_64
    define i32 @x86_64_func(i64 %r.coerce0, i64 %r.coerce1) #0 {
    entry:
      %r = alloca %struct.record, align 8
      %0 = bitcast %struct.record* %r to { i64, i64 }*
      %1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i32 0,
i32 0
      store i64 %r.coerce0, i64* %1, align 8
      %2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i32 0,
i32 1
      store i64 %r.coerce1, i64* %2, align 8
      ....

For such code sequence, the current SROA generates instructions to update
only upper (or lower) half of the 64-bit value when storing r.b (or r.c).
SROA can split an i64 value into two i32 values under some conditions (e.g.
when the struct contains only int b and int c in this example), but it is
not capable of splitting complex cases.

* Approach *
In SROA pass, AggLoadStoreRewriter splits a load or store instructions for
an aggregate into multiple load or store instructions for simple values. In
above ppc64 case, store [2 x i64] is splitted into two store for i64.
I am extending AggLoadStoreRewriter to split a store of an aggregate that
comes from a method argument based on the format of the aggregate (Here,
{ i64, i32, i32 } instead of [2 x i64]). I have submitted a
work-in-progress patch in Phabricator ( https://reviews.llvm.org/D32998 ).
This optimization depends on the ABI, so I enabled this only for ppc64 with
ELFv2 ABI so far.

I truly appreciate any advices and comments.
Best regards,
Hiroshi

-----
Hiroshi Inoue <inouehrs at jp.ibm.com>
IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170509/1d58d531/attachment-0001.html>