[cfe-dev] struct copy
Chris Lattner
clattner at apple.com
Sun Sep 28 09:14:11 PDT 2008
On Sep 28, 2008, at 4:59 AM, Argiris Kirtzidis wrote:
> Chris Lattner wrote:
>>
>> In theory, it should be safe and fast for the compiler to always
>> produce a memcpy and then let the backend lower it however it wants.
>
> What is wrong with having the compiler always produce a load/store ?
For small structs, almost nothing! It has the same semantics as an
element-by-element copy. For large structs, you really don't want to
do this.
The problem is that you need the same heuristic: you need to know that
the struct is "small" and the struct has no holes in the LLVM type
that some other element of the C type (e.g. through a union) contain
data in.
Mattias wrote:
> Chris Lattner <clattner at apple.com> writes:
>
>> This is true on the micro-level, but is false in the macro level.
>> For
>> example, if the caller of a function does a one byte store into a
>> struct field, and the callee does a memcpy (ending up with a 32-bit
>> read), you get a store forwarding speculation failure on most out of
>> order processors.
>
> Thank you, I didn't think of that. I wonder to what extent that effect
> depends on the existence of holes in the struct layout. Should structs
> always be copied member-by-member, even if they consist of many small
> bytes?
>
> struct S {
> char c[7];
> double d;
> }
>
> Would seven byte copies really be faster than one 64-bit word copy?
> If there were eight bytes in the array, there would be no hole but
> the problem remains - at least if the entire array was not written to
> before the struct copy.
There is no easy answer, it depends a lot on other environmental
effects, like whether there is an access to c[3] or c[i] shortly after
the store to the struct.
In the absence of such an access that is close enough to matter,
you're right that it would be better to copy the 7 bytes with an 8
byte load and store instead of 7 one by load/stores. The idea of the
current heuristic is that the code generator should theoretically be
able to merge together neighboring load/stores into wider loads. The
two caveats here are 1) the codegen doesn't do this yet, and 2) even
when it does, it won't know when it is safe for it to load/store
*more* data than is requested. In this case, it wouldn't know that it
is safe to load/store 8 bytes, because only 7 are being accessed.
Most optimization work in LLVM is demand driven. If you find a real
world testcase that this impacts, we can devise some solutions.
-Chris
More information about the cfe-dev
mailing list