[llvm-commits] [cfe-commits] [PATCH] __builtin_assume_aligned for Clang and LLVM

Sat Dec 1 01:53:22 PST 2012

On Sat, Dec 1, 2012 at 12:51 AM, Chandler Carruth <chandlerc at google.com>wrote:

> On Fri, Nov 30, 2012 at 4:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>> Hi everyone,
>>
>> Many compilers provide a way, through either pragmas or intrinsics, for
>> the user to assert stronger alignment requirements on a pointee than is
>> otherwise implied by the pointer's type. gcc now provides an intrinsic for
>> this purpose, __builtin_assume_aligned, and the attached patches (one for
>> Clang and one for LLVM) implement that intrinsic using a corresponding LLVM
>> intrinsic, and provide an infrastructure to take advantage of this new
>> information.
>>
>> ** BEGIN justification -- skip this if you don't care ;) **
>>
>
> I'd like to start off by saying, I think this is absolutely an important
> use case. =]
>
> <snip the attribute / type discussion ... i agree here>
>
> With the intrinsic, this is easier:
>> foo (double *a, double *b) {
>>   a = __builtin_assume_aligned(a, 16);
>>   b = __builtin_assume_aligned(b, 16);
>>   for (int i = 0; i < N; ++i)
>>     a[i] = b[i] + 1;
>> }
>> this code can be vectorized with aligned loads and stores, and even if it
>> is not vectorized, will remain correct.
>>
>> The second problem with the purely type-based approach, is that it
>> requires manual loop unrolling an inlining. Because the intrinsics are
>> evaluated after inlining (and after loop unrolling), the optimizer can use
>> the alignment assumptions specified in the caller when generating code for
>> an inlined callee. This is a very important capability.
>
>
> I think this is going to be a very important point for me to understand
> fully. I see the problem with inlining, but not with unrolling. Can you
> describe what the issue is there?
>
> <snip>
>
> Mirroring the gcc (and now Clang) intrinsic, the corresponding LLVM
>> intrinsic is:
>> <t1>* @llvm.assume.aligned.p<s><t1>.<t2>(<t1>* addr, i32 alignment, <int
>> t2> offset)
>> which asserts that the address returned is offset bytes above an address
>> with the specified alignment. The attached patch makes some simple changes
>> to several analysis passes (like BasicAA and SE) to allow them to 'look
>> through' the intrinsic. It also adds a transformation pass that propagates
>> the alignment assumptions to loads and stores directly dependent on the
>> intrinsics's return value. Once this is done, the intrinsics are removed so
>> that they don't interfere with the remaining optimizations.
>>
>
> I think this approach has more problems than you're currently dealing
> with. There are lots of passes that will run before loop unrolling and
> which will need to look through pointers. I'm extremely uncomfortable
> adding a *new* way to propagate a pointer from one SSA value to another,
> there are soooo many places in the compiler we will have to fix. Your patch
> already has a very large amount of duplicated code scattered about a large
> collection of passes. I think this is an indication that this design isn't
> ideal, so I'd really like to see if we can come up with a better in-IR
> representation for this which will simplify how the rest of the optimizer
> interacts with it.
>
> I see a few options off the top of my head, but I don't yet understand the
> problem domain well enough to really dig into any of them. Specifically, I
> understand the transforms you're trying to enable, and the reason they
> aren't currently possible, but I don't understand the C/C++ code patterns
> you are imagining, and where in those code patterns this invariant will be
> introduced. If you can help with more examples maybe?
>
> Ok, on with the straw-man proposals:
>
> 1) Add support for declaring a pointer parameter's *value* to be aligned:
> void foo([[clang::aligned_pointer(16)]] double *a,
>          [[clang::aligned_pointer(16)]] double *b) {
>
>   for (int i = 0; i < N; ++i)
>     a[i] = b[i] + 1;
> }
> the idea here is that these alignment constraints would be substantially
> different from either the GCC alignment attributes or the C++11 alignment
> attributes. Note that these are actually very different, and Clang
> currently gets the C++11 one wrong.
> GCC alignment attribute: makes the *type* be aligned to the specified
> amount.
> C++11 alignment attribute: makes the *storage* be aligned to the specified
> amount. (I think.. I'll check w/ Richard on Monday)
>

The C++11 attribute can be used in two ways (neither is what you want here):
 * On the declaration of a variable or data member, to overalign the
storage of that entity
 * On the definition of a tag type, to overalign all objects of that type

I suppose you could do this:

template<typename T, size_t Align> struct alignas(Align) overaligned { T
value; };
void foo(overaligned<double, 16> *a, overaligned<double, 16>) {
  // ...
}

...but I'm not sure whether the alignment information would make it into
the IR. Maybe this proposal could help with that?

> Interestingly enough, C++11's definition would almost work for you if the
> inliner flattens to the point at which storage is declared. My suggestion
> is just to be able to also mark a pointer itself at an interface boundary
> as carrying this constraint. It wouldn't change the type at all. You could
> imagine this working much like GCC's non-null attribute does.
>
> The way my suggestion might be implemented in LLVM: an attribute or
> metadata on the function parameters. Then we teach instcombine and loop
> passes to detect when this constraint on alignment can be used to promote
> the alignment of loads and stores. The loop passes can detect when
> unrolling has allowed the stride to exceed the alignment, and then promote
> the load and store alignment, etc.
>
>
> 2) Use an invariant based design much as Alex suggested. The syntax for
> this could be exactly the same as the GCC builtin you mention, or the same
> as my syntax in #1. This representation is I think a strictly more
> generalized LLVM IR model from the others under discussion.
>
> I have two completely off-the-cuff ideas about how to represent this.
> Others who've been thinking about it longer may have better ideas:
> a) Dead code to produce a CmpInst that the invariant asserts is true.
> %ptrtoint = ptrtoint double* %ptr to i64
> %lowbits = and i64 %ptrtoint, 0xF
> %invariant_cmp = icmp eq i64 %lowbits, 0
> call void @llvm.invariant(i1 %invariant_cmp)
>
> b) Dead code to produce two values which we assert are equivalent. Then
> when trying to optimize one, we can leverage things known about the other
> (such as simplify demanded bits, etc).
> %ptrtoint = ptrtoint double* %ptr to i64
> %maskedint = and i64 %ptrtoint, 0xFFFFFFFFFFFFFFF0
> %maskedptr = inttoptr i64 %maskedint to double* %ptr
> call void @llvm.equivalent(double* %ptr, double* %maskedptr)
>
> (I realize (b) here isn't really implementable w/ our current intrinsic
> overloading, but you get the idea)
>
> The difference is -- how do optimizations use them? With (b), the
> optimization can just query N equivalent values for any property. If it
> holds for any, it holds for all. With (a), we have to specifically
> interpret the predicate which is an invariant. While (a) seems like a bit
> of extra work, I actually prefer it because it seems more general, easier
> to understand, and to support fairly aggressive caching of invariants, etc.
>
>
> 3) Your proposal, but handled more like __builtin_expect. Have a very
> early pass that converts the intrinsic into metadata attached to a
> particular pointer SSA value. Then teach passes to preserve the metadata,
> and have your pass promote alignment based on it.
>
>
> Of these, my preferences in order are #2, #3, and #1. I like #2 because it
> would begin building a tremendously powerful and much needed piece of
> infrastructure in LLVM, and serve many needs in addition to alignment.
>
> Thoughts? If you're interested in working on #2, I'd love to talk through
> the design....
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121201/a375e408/attachment.html>