[PATCH] D68414: [SROA] Enhance AggLoadStoreRewriter to rewrite integer load/store if it covers multi fields in original aggregate
Guozhi Wei via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 3 12:07:29 PDT 2019
Carrot created this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
The motivated example is:
enum ResultType {
a, b, c, d,
Error,
};
struct Result {
Result(ResultType type = Error, unsigned hash = 0)
: type(type), hash(hash) {}
ResultType type;
unsigned hash;
};
template<typename Function>
inline Result foo(Function function) {
bool done;
Result result;
std::tie(done, result) = function();
if (done) return result;
return Result(Error);
}
int main(int argc, char** argv) {
auto function = [] { return std::make_tuple(false, Result()); };
Result result = foo(function);
return int(result.type);
}
When compiled with libc++, llvm generates:
movb $0, -16(%rsp)
movq $4, -12(%rsp)
movq -16(%rsp), %rcx
movq %rcx, -16(%rsp)
movl $4, %eax
testb %cl, %cl
je .LBB0_2
All of the memory accesses are redundant.
The problem is the underlying tuple structure looks like
{i8, {i32, i32}}
Its total size is 96 bit, small enough to be returned through registers, but as function return value its type is changed to
{i64, i32}
So for the temporary alloca object to receive the result of the lambda function, it is written and read as different types. When alloca slices are built from memory accesses, these slices overlapped with each other
Slices of alloca: %6 = alloca %"struct.std::__u::__tuple_impl", align 8
[0,8) slice #0
used by: store i64 %20, i64* %22
[0,1) slice #1
used by: %31 = load i8, i8* %30, align 8
[0,12) slice #2 (splittable)
used by: call void @llvm.lifetime.end.p0i8(i64 12, i8* %40)
[0,12) slice #3 (splittable)
used by: call void @llvm.lifetime.start.p0i8(i64 12, i8* %12)
[4,12) slice #4
used by: %37 = load i64, i64* %36, align 4
[8,12) slice #5
used by: store i32 %21, i32* %23, align 8
then all of the slices are grouped together as a single one, so no SROA occurred.
This patch solved the problem by splitting some integer load/store which covers multiple fields of the alloca aggregate, and these fields have different parent structure. In following example
{i32, {i32, i32}}
%ptrval = ptrtoint %struct.ptr* %ptr to i64
%ptrval2 = add i64 %ptrval, 4
%ptr1 = inttoptr i64 %ptrval to i64*
%ptr2 = inttoptr i64 %ptrval2 to i64*
%val1 = load i64, i64* ptr1, align 4
%val2 = load i64, i64* ptr2, align 4
The first 64-bit load will be rewritten to 2 32-bit loads because it actually access 2 fields in the original aggregate, and the two fields don't belong to the same inner structure.
The second load won't be rewritten because all fields accessed by the load belong to the same inner structure, it's a common case in LLVM IR.
Repository:
rL LLVM
https://reviews.llvm.org/D68414
Files:
lib/Transforms/Scalar/SROA.cpp
test/Transforms/SROA/split-integer.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D68414.223063.patch
Type: text/x-patch
Size: 18796 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191003/2eb8029a/attachment.bin>
More information about the llvm-commits
mailing list