[llvm-dev] [cfe-dev] How to tell if a class contains tail padding?
Han Zhu via llvm-dev
llvm-dev at lists.llvm.org
Fri Jul 23 11:54:03 PDT 2021
Eli, Richard,
Thank you both for the advice. It seems indeed non-trivial to handle this
case. I'll evaluate this again to see if it's worth implementing.
Thanks,
Han
On Wed, Jul 21, 2021 at 2:27 PM Richard Smith <richard at metafoo.co.uk> wrote:
> On Wed, 21 Jul 2021 at 13:43, Han Zhu via cfe-dev <cfe-dev at lists.llvm.org>
> wrote:
>
>> After clang codegen, is the class layout final? Does the optimizer modify
>> the class layout?
>>
>
> LLVM IR does not contain enough information to determine whether the
> optimization is valid. The IR that we generate for:
>
> struct S {
> ~S() {}
> int a;
> int b;
> char c;
> // 3 bytes padding
> };
>
> unsigned copy_noalias(S* __restrict__ a, S* b, int n) {
> for (int i = 0; i < n; i++) {
> a[i] = b[i];
> }
> return sizeof(a[0]);
> }
>
> ... would also be correct IR to generate for:
>
> struct T : S {
> char x, y, z; // stored in S's tail padding
> };
>
> unsigned copy_noalias2(T* __restrict__ a, T* b, int n) {
> for (int i = 0; i < n; i++) {
> (S&)a[i] = (S&)b[i];
> }
> return sizeof((S&)a[0]);
> }
>
> You'll need to generate some additional information from the frontend if
> you want to be able to do this. You could, in at least some cases, analyze
> the types and expressions involved and locally prove that you know the last
> few bytes are guaranteed to be padding, then generate !tbaa.struct metadata
> and attach it to the @llvm.memcpy call that the frontend emits.
>
> On Wed, Jul 21, 2021 at 12:31 PM Eli Friedman <efriedma at quicinc.com>
>> wrote:
>>
>>> I suspect the transform you’re trying to do is more complicated than
>>> you’re making it out to be.
>>>
>>>
>>>
>>> In general, if you have a class that isn’t “POD for the purpose of
>>> layout” (https://itanium-cxx-abi.github.io/cxx-abi/abi.html), derived
>>> classes can store data in the tail padding. So the “padding” might contain
>>> data the program cares about. If you want to overwrite that space, you
>>> need to prove there isn’t a derived class storing data there.
>>>
>>>
>>>
>>> Possible proof approaches:
>>>
>>>
>>>
>>> 1. If the class is marked “final”, there aren’t any derived classes.
>>> 2. Array indexing with the wrong pointer type might be illegal.
>>>
>>>
>>>
>>> -Eli
>>>
>>>
>>>
>>> *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> *On Behalf Of *Han Zhu
>>> via cfe-dev
>>> *Sent:* Wednesday, July 21, 2021 11:46 AM
>>> *To:* cfe-dev at lists.llvm.org; llvm-dev at lists.llvm.org
>>> *Subject:* [EXT] [cfe-dev] How to tell if a class contains tail padding?
>>>
>>>
>>>
>>> Hi,
>>>
>>> I'm working on an optimization to improve LoopIdiomRecognize pass. For a
>>> trivial loop like this:
>>>
>>> ```
>>> struct S {
>>> int a;
>>> int b;
>>> char c;
>>> // 3 bytes padding
>>> };
>>>
>>> unsigned copy_noalias(S* __restrict__ a, S* b, int n) {
>>> for (int i = 0; i < n; i++) {
>>> a[i] = b[i];
>>> }
>>> return sizeof(a[0]);
>>> }
>>> ```
>>>
>>> Clang generates the below loop (some parts of IR omitted):
>>> ```
>>> %struct.S = type { i32, i32, i8 }
>>>
>>> for.body: ; preds = %for.cond
>>> %2 = load %struct.S*, %struct.S** %b.addr, align 8
>>> %3 = load i32, i32* %i, align 4
>>> %idxprom = sext i32 %3 to i64
>>> %arrayidx = getelementptr inbounds %struct.S, %struct.S* %2, i64
>>> %idxprom
>>> %4 = load %struct.S*, %struct.S** %a.addr, align 8
>>> %5 = load i32, i32* %i, align 4
>>> %idxprom1 = sext i32 %5 to i64
>>> %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %4, i64
>>> %idxprom1
>>> %6 = bitcast %struct.S* %arrayidx2 to i8*
>>> %7 = bitcast %struct.S* %arrayidx to i8*
>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %6, i8* align 4 %7,
>>> i64 12, i1 false)
>>> br label %for.inc
>>> ```
>>>
>>> It can be transformed into a single memcpy:
>>>
>>> ```
>>> for.body.preheader: ; preds = %entry
>>> %b10 = bitcast %struct.S* %b to i8*
>>> %a9 = bitcast %struct.S* %a to i8*
>>> %0 = zext i32 %n to i64
>>> %1 = mul nuw nsw i64 %0, 12
>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a9, i8* align 4
>>> %b10, i64 %1, i1 false)
>>> br label %for.cond.cleanup
>>> ```
>>>
>>> The problem is, if the copied elements are a class, this doesn't work.
>>> For a
>>> class with the same members:
>>> ```
>>> %class.C = type <{ i32, i32, i8, [3 x i8] }>
>>> ```
>>>
>>> Clang does some optimization to generate a memcpy of nine bytes,
>>> omitting the
>>> tail padding:
>>>
>>> ```
>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %6, i8* align 4 %7, i64
>>> 9, i1 false)
>>> ```
>>>
>>> Then in LLVM, we find the memcpy is not touching every byte of the
>>> array, so
>>> we abort the transformation.
>>>
>>> If we could tell the untouched three bytes are padding, we should be
>>> able to
>>> still do the optimization, but LLVM doesn't seem to have this
>>> information. I
>>> tried using `DataLayout::getTypeStoreSize()`, and it returned 12 bytes.
>>> I also
>>> tried `StructLayout`, and it treats the tail padding as a regular class
>>> member.
>>>
>>> Is there an API in LLVM to tell if a class has tail padding? If not,
>>> would it
>>> be useful to add this feature?
>>>
>>> Thanks,
>>> Han
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210723/04f98255/attachment-0001.html>
More information about the llvm-dev
mailing list