[PATCH] D34071: [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion

Tue Jun 20 10:14:18 PDT 2017

spatel added a comment.

Thanks, Eric!

For those following the progress for x86, I enabled the smallest sizes with:
https://reviews.llvm.org/rL305802

But now I see 2 more missed IR optimizations, and I'm again wondering: is there a reason for this expansion to occur in CGP rather than its own pass which could be before the final simplifycfg/instcombine in a normal opt pipeline?

Looking at the general 4-byte memcmp() expansion as an example:

  define i32 @cmp4(i8* %x, i8* %y) {
  loadbb:
    %0 = bitcast i8* %x to i32*
    %1 = bitcast i8* %y to i32*
    %2 = load i32, i32* %0
    %3 = load i32, i32* %1
    %4 = call i32 @llvm.bswap.i32(i32 %2)
    %5 = call i32 @llvm.bswap.i32(i32 %3)
    %6 = zext i32 %4 to i64   <--- the extends are unnecessary
    %7 = zext i32 %5 to i64
    %8 = sub i64 %6, %7       <--- causing a too-wide sub
    %9 = icmp ne i64 %8, 0    <--- and a too-wide cmp
    br i1 %9, label %res_block, label %endblock

  res_block:                     
    %10 = icmp ult i64 %6, %7
    %11 = select i1 %10, i32 -1, i32 1
    br label %endblock

  endblock:  <--- this could have been simplified to a select     
    %phi.res = phi i32 [ 0, %loadbb ], [ %11, %res_block ]
    ret i32 %phi.res
  }

Sure enough, if we run -simplifycfg and -instcombine, we get:

  %0 = bitcast i8* %x to i32*
  %1 = bitcast i8* %y to i32*
  %2 = load i32, i32* %0, align 4
  %3 = load i32, i32* %1, align 4
  %4 = call i32 @llvm.bswap.i32(i32 %2)
  %5 = call i32 @llvm.bswap.i32(i32 %3)
  %6 = icmp ne i32 %4, %5
  %7 = icmp ult i32 %4, %5
  %8 = select i1 %7, i32 -1, i32 1
  %phi.res = select i1 %6, i32 %8, i32 0
  ret i32 %phi.res

Repository:
  rL LLVM

https://reviews.llvm.org/D34071