[PATCH] D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize
Sjoerd Meijer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 14 04:45:24 PDT 2018
SjoerdMeijer added a comment.
Thanks both, those are fair points.
For a bit more context, this is the problem that I am trying to solve:
void foo (char *A, char *B) {
memcpy(A, B, 8);
}
compiled with `-Oz -mno-unaligned-access` this results in this disaster:
ldrb r3, [r1]
ldrb r4, [r1, #1]
ldrb r5, [r1, #2]
ldrb r6, [r1, #3]
ldrb r1, [r1, #5]
ldrb lr, [r2, #4]!
ldrb.w r12, [r2, #2]
ldrb r2, [r2, #3]
strb r1, [r0, #5]
strb r6, [r0, #3]
strb r5, [r0, #2]
strb r4, [r0, #1]
strb r3, [r0]
strb lr, [r0, #4]!
strb r2, [r0, #3]
strb.w r12, [r0, #2]
ldr r11, [sp], #4
but forgetting about this no-unaligned case, we see that with alignment support the code bloat is already there:
ldr r2, [r1]
ldr r1, [r1, #4]
str r1, [r0, #4]
str r2, [r0]
bx lr
So, for the decision making here in InstCombine, which is mostly target independent at the moment, I would like to ignore the whole aligned/unaligned business. And what I want to generate is of course just this:
movs r2, #8
b __aeabi_memcpy
Now, surprisingly, this is also what we generate for X86 and AArch64 with `-Oz`, whereas we would perhaps expect a load and a store on these 64-bit architectures? I don't know why that is not happening, if there is a reason for, and I need to look into that.
Either way, this patch generates the same code, and is consistent with that. And I think the hard coding of size > 4 is mostly inline with the some of these checks already there.
https://reviews.llvm.org/D52081
More information about the llvm-commits
mailing list