[PATCH] D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 20 10:14:18 PDT 2017
spatel added a comment.
Thanks, Eric!
For those following the progress for x86, I enabled the smallest sizes with:
https://reviews.llvm.org/rL305802
But now I see 2 more missed IR optimizations, and I'm again wondering: is there a reason for this expansion to occur in CGP rather than its own pass which could be before the final simplifycfg/instcombine in a normal opt pipeline?
Looking at the general 4-byte memcmp() expansion as an example:
define i32 @cmp4(i8* %x, i8* %y) {
loadbb:
%0 = bitcast i8* %x to i32*
%1 = bitcast i8* %y to i32*
%2 = load i32, i32* %0
%3 = load i32, i32* %1
%4 = call i32 @llvm.bswap.i32(i32 %2)
%5 = call i32 @llvm.bswap.i32(i32 %3)
%6 = zext i32 %4 to i64 <--- the extends are unnecessary
%7 = zext i32 %5 to i64
%8 = sub i64 %6, %7 <--- causing a too-wide sub
%9 = icmp ne i64 %8, 0 <--- and a too-wide cmp
br i1 %9, label %res_block, label %endblock
res_block:
%10 = icmp ult i64 %6, %7
%11 = select i1 %10, i32 -1, i32 1
br label %endblock
endblock: <--- this could have been simplified to a select
%phi.res = phi i32 [ 0, %loadbb ], [ %11, %res_block ]
ret i32 %phi.res
}
Sure enough, if we run -simplifycfg and -instcombine, we get:
%0 = bitcast i8* %x to i32*
%1 = bitcast i8* %y to i32*
%2 = load i32, i32* %0, align 4
%3 = load i32, i32* %1, align 4
%4 = call i32 @llvm.bswap.i32(i32 %2)
%5 = call i32 @llvm.bswap.i32(i32 %3)
%6 = icmp ne i32 %4, %5
%7 = icmp ult i32 %4, %5
%8 = select i1 %7, i32 -1, i32 1
%phi.res = select i1 %6, i32 %8, i32 0
ret i32 %phi.res
Repository:
rL LLVM
https://reviews.llvm.org/D34071
More information about the llvm-commits
mailing list