[PATCH] D34005: [CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion

Wed Jun 7 11:08:25 PDT 2017

spatel created this revision.
Herald added a subscriber: mcrosier.

The test diff for PowerPC is minimal, but for x86, there's a substantial difference because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this:

  _cmp_eq8:
  	movq	(%rdi), %rax
  	cmpq	(%rsi), %rax
  	je	LBB23_1
  ## BB#2:                                ## %res_block
  	movl	$1, %ecx
  	jmp	LBB23_3
  LBB23_1:
  	xorl	%ecx, %ecx
  LBB23_3:                                ## %endblock
  	xorl	%eax, %eax
  	testl	%ecx, %ecx
  	sete	%al
  	retq

We get this:

  cmp_eq8:   
  	movq	(%rdi), %rcx
  	xorl	%eax, %eax
  	cmpq	(%rsi), %rcx
  	sete	%al
  	retq

And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally.

https://reviews.llvm.org/D34005

Files:
  lib/CodeGen/CodeGenPrepare.cpp
  test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D34005.101780.patch
Type: text/x-patch
Size: 5256 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170607/e01c7c3f/attachment.bin>