[llvm-dev] RFC: SROA for method argument
Hiroshi 7 Inoue via llvm-dev
llvm-dev at lists.llvm.org
Tue May 9 06:05:48 PDT 2017
Hi,
I am working to improve SROA to generate better code when a method has a
struct in its arguments. I would appreciate it if I could have any
suggestions or comments on how I can best proceed with this optimization.
* Problem *
I observed that LLVM often generates redundant instructions around glibc’s
istreambuf_iterator. The problem comes from the scalar replacement (SROA)
for methods with an aggregate as an argument. Here is a simplified example
in C.
struct record {
long long a;
int b;
int c;
};
int func(struct record r) {
for (int i = 0; i < r.c; i++)
r.b++;
return r.b;
}
When updating r.b (or r.c as well), SROA generates redundant instructions
on some platforms (such as x86_64 and ppc64); here, r.b and r.c are packed
into one 64-bit GPR when the struct is passed as a method argument. The
problem is caused when the same memory location is accessed by load/store
instructions of different types.
For this example, CLANG generates following IRs to initialize the struct
for ppc64 and x86_64. For both platforms, the 64-bit value is stored into
memory allocated by alloca first. Later, the same memory location is
accessed as 32-bit integer values (r.b and r.c).
for ppc64
%struct.record = type { i64, i32, i32 }
define signext i32 @ppc64le_func([2 x i64] %r.coerce) #0 {
entry:
%r = alloca %struct.record, align 8
%0 = bitcast %struct.record* %r to [2 x i64]*
store [2 x i64] %r.coerce, [2 x i64]* %0, align 8
....
for x86_64
define i32 @x86_64_func(i64 %r.coerce0, i64 %r.coerce1) #0 {
entry:
%r = alloca %struct.record, align 8
%0 = bitcast %struct.record* %r to { i64, i64 }*
%1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i32 0,
i32 0
store i64 %r.coerce0, i64* %1, align 8
%2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i32 0,
i32 1
store i64 %r.coerce1, i64* %2, align 8
....
For such code sequence, the current SROA generates instructions to update
only upper (or lower) half of the 64-bit value when storing r.b (or r.c).
SROA can split an i64 value into two i32 values under some conditions (e.g.
when the struct contains only int b and int c in this example), but it is
not capable of splitting complex cases.
* Approach *
In SROA pass, AggLoadStoreRewriter splits a load or store instructions for
an aggregate into multiple load or store instructions for simple values. In
above ppc64 case, store [2 x i64] is splitted into two store for i64.
I am extending AggLoadStoreRewriter to split a store of an aggregate that
comes from a method argument based on the format of the aggregate (Here,
{ i64, i32, i32 } instead of [2 x i64]). I have submitted a
work-in-progress patch in Phabricator ( https://reviews.llvm.org/D32998 ).
This optimization depends on the ABI, so I enabled this only for ppc64 with
ELFv2 ABI so far.
I truly appreciate any advices and comments.
Best regards,
Hiroshi
-----
Hiroshi Inoue <inouehrs at jp.ibm.com>
IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170509/1d58d531/attachment-0001.html>
More information about the llvm-dev
mailing list