[PATCH] D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize

Fri Sep 14 04:45:24 PDT 2018

SjoerdMeijer added a comment.

Thanks both, those are fair points.

For a bit more context, this is the problem that I am trying to solve:

  void foo (char *A, char *B) {
    memcpy(A, B, 8);
  }

compiled with `-Oz -mno-unaligned-access` this results in this disaster:

  ldrb	r3, [r1]
  ldrb	r4, [r1, #1]
  ldrb	r5, [r1, #2]
  ldrb	r6, [r1, #3]
  ldrb	r1, [r1, #5]
  ldrb	lr, [r2, #4]!
  ldrb.w	r12, [r2, #2]
  ldrb	r2, [r2, #3]
  strb	r1, [r0, #5]
  strb	r6, [r0, #3]
  strb	r5, [r0, #2]
  strb	r4, [r0, #1]
  strb	r3, [r0]
  strb	lr, [r0, #4]!
  strb	r2, [r0, #3]
  strb.w	r12, [r0, #2]
  ldr	r11, [sp], #4

but forgetting about this no-unaligned case, we see that with alignment support the code bloat is already there:

  ldr	r2, [r1]
  ldr	r1, [r1, #4]
  str	r1, [r0, #4]
  str	r2, [r0]
  bx	lr

So, for the decision making here in InstCombine, which is mostly target independent at the moment, I would like to ignore the whole aligned/unaligned business. And what I want to generate is of course just this:

  movs	r2, #8
  b	__aeabi_memcpy

Now, surprisingly, this is also what we generate for X86 and AArch64 with `-Oz`, whereas we would perhaps expect a load and a store on these 64-bit architectures? I don't know why that is not happening, if there is a reason for, and I need to look into that.

Either way, this patch generates the same code, and is consistent with that. And I think the hard coding of size > 4 is mostly inline with the some of these checks already there.

https://reviews.llvm.org/D52081