[PATCH] D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize

Fri Sep 14 04:51:32 PDT 2018

lebedev.ri added a comment.

In https://reviews.llvm.org/D52081#1234686, @SjoerdMeijer wrote:

> Thanks both, those are fair points.
>
> For a bit more context, this is the problem that I am trying to solve:
>
>   void foo (char *A, char *B) {
>     memcpy(A, B, 8);
>   }
>   
>
> compiled with `-Oz -mno-unaligned-access` this results in this disaster:
>
>   ldrb	r3, [r1]
>   ldrb	r4, [r1, #1]
>   ldrb	r5, [r1, #2]
>   ldrb	r6, [r1, #3]
>   ldrb	r1, [r1, #5]
>   ldrb	lr, [r2, #4]!
>   ldrb.w	r12, [r2, #2]
>   ldrb	r2, [r2, #3]
>   strb	r1, [r0, #5]
>   strb	r6, [r0, #3]
>   strb	r5, [r0, #2]
>   strb	r4, [r0, #1]
>   strb	r3, [r0]
>   strb	lr, [r0, #4]!
>   strb	r2, [r0, #3]
>   strb.w	r12, [r0, #2]
>   ldr	r11, [sp], #4
>   
>
> but forgetting about this no-unaligned case, we see that with alignment support the code bloat is already there:
>
>   ldr	r2, [r1]
>   ldr	r1, [r1, #4]
>   str	r1, [r0, #4]
>   str	r2, [r0]
>   bx	lr
>   
>
> So, for the decision making here in InstCombine, which is mostly target independent at the moment, I would like to ignore the whole aligned/unaligned business. And what I want to generate is of course just this:
>
>   movs	r2, #8
>   b	__aeabi_memcpy

Are you sure you shouldn't be fixing this in the backend?

> Now, surprisingly, this is also what we generate for X86 and AArch64 with `-Oz`, whereas we would perhaps expect a load and a store on these 64-bit architectures? I don't know why that is not happening, if there is a reason for, and I need to look into that.
> 
> Either way, this patch generates the same code, and is consistent with that. And I think the hard coding of size > 4 is mostly inline with the some of these checks already there.

https://reviews.llvm.org/D52081