[LLVMdev] [RFC] Attributes on Values

Tue Sep 9 01:36:44 PDT 2014

Hi everyone,

Nick and Philip suggested something yesterday that I'd also thought about: supporting attributes on values (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140908/234323.html). The primary motivation for this is to provide a way of attaching pointer information, such as noalias, nonnull and dereferenceable(n), to pointers generated by loads. Doing this for pointers generated by inttoptr also seems potentially useful.

A problem that we currently have with C++ lambda captures from Clang (and will have with other similar constructs, like outlined OpenMP loop bodies), is that we end up packing things that would be function parameters into structures to pass to the generated function. Unfortunately, this means that, in the generated anonymous function, these values are generated by loads (not true function parameters), and so we lose our ability to assert things about them (like noalias, nonnull, etc.).

How this might work:

 1. Instead of only CallInst/InvokeInst having an AttributeSet member variable, all instructions would have one. Accessor functions like getAttributes would be moved from CallInst/InvokeInst to Instruction. Only 'AttributeSet::ReturnIndex' would be meaningful on most instructions.

 2. For the text IR format: Like with call/invoke currently, we'd optionally parse attributes in between the instruction name ('load', etc.) and the first type parameter. So load would become:
  <result> = load [ret attrs] [volatile] <ty>* <pointer>[, align <alignment>] (...)
allowing you to write:
  %val = load dereferenceable(32) i32** %ptr

 3. The bitcode format extension would mirror the setup for metadata. A subblock type bitc::ATTRIBUTES_ATTACHMENT_ID would be added with a record array of [Instruction #, Attribute Set #]. This will allow attaching attribute sets to any instruction without increasing the bitcode record sizes for the instructions themselves.

Obviously there is some potential functionality overlap here with metadata, and an alternate approach would be to add a metadata type that mirrors each attribute we'd like to model on values. Potential downsides to using metadata are:
 - It could be a lot of metadata (consider adding nonnull/dereferenceable(n) to every load of a C++ reference type)
 - We already have code that handles these as return attributes on call instructions; extending that to look at all instructions seems straightforward. Adding alternate code to also look at metadata also would require additional development for each attribute.

One thing to keep in mind is that, like with metadata, the attributes can have control-flow dependencies, and we'd generally need to clear them when hoisting loads, etc. We already do this with metadata, at least for loads, so the places where we'd need to do this should hopefully be relatively-easy to find.

What do you think?

Thanks again,
Hal

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory