[LLVMdev] RFC: Metadata attachments to function definitions

Wed Apr 15 09:55:09 PDT 2015

> On 2015-Apr-15, at 08:11, Ivan Baev <ibaev at codeaurora.org> wrote:
> 
>> Date: Tue, 14 Apr 2015 21:33:03 -0700
>> From: "Duncan P. N. Exon Smith" <dexonsmith at apple.com>
>> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>> Subject: [LLVMdev] RFC: Metadata attachments to function definitions
>> Message-ID: <BF4002F0-06DC-4A25-AF84-7D21AD48121A at apple.com>
>> Content-Type: text/plain; charset=us-ascii
>> 
>> 
>> `Function` definitions should support `MDNode` attachments, with a
>> similar syntax to instructions:
>> 
>>    define void @foo() nounwind !attach !0 {
>>      unreachable
>>    }
>>    !0 = !{}
>> 
>> Attachments wouldn't be allowed on declarations, just definitions.
>> 
>> There are two open problems this can help with:
>> 
>> 1. For PGO, we need somewhere to attach the function entry count.
>>    Attaching to the function definition is a simple solution.
>> 
>>        define void @foo() !prof !0 {
>>          unreachable
>>        }
>>        !0 = !{i32 987}
> 
> 
> That will be great. Would it be better if the type is i64? We really want
> to avoid an overflow if possible. What is the type for the raw profile
> data for region(0)?

Right, of course; `i64` makes more sense for entry counts.  BTW, I'm not
volunteering to implement this one myself, just to provide the metadata
infrastructure.

>> 2. In debug info, we repeatedly build up a map from `Function` to the
>>    canonical `MDSubrogram` for it.  Keeping this mapping accurate takes
>>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
>>    expensive to compute and maintain.  Attaching it directly to the
>>    `Function` designs away the problem.
>> 
>>        define void @foo() !dbg !0 {
>>          unreachable
>>        }
>>        !0 = !MDSubprogram(name: "foo", function: void ()* @foo)
>> 
>> Thoughts?
>> 
>> Moving onto implementation, I'd provide the same generic API that
>> `Instruction` has, and wouldn't bother with the "fast path" API for
>> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since there are
>> fewer functions than instructions, we can afford to store the
>> attachments directly on the `Function` instead of off in the
>> `LLVMContext`.
>> 
>> It's not clear to me just how precious memory is in `Function`; IIRC
>> it's sitting at 168B right now for x86-64.  IMO, a `SmallVector<..., 1>`
>> -- cost of 64B -- seems fine.  I'll start with this if I don't hear any
>> objections; we can optimize later if necessary.
>> 
>> Otherwise, I could hack together a custom vector-like object with the
>> "small string" optimization.  Cost would be 16B per `Function`, with the
>> same malloc/heap costs as `SmallVector<..., 1>`.  But I haven't seen a
>> profile that suggests this would be worth the complexity...
>> 
>> Any opinions?
>> 
> 
>