[PATCH] D31885: Remove TBAA information from LValues representing union members

Wed Apr 19 14:09:32 PDT 2017

rjmccall added a comment.

In https://reviews.llvm.org/D31885#730958, @dberlin wrote:

> In https://reviews.llvm.org/D31885#730936, @rjmccall wrote:
>
> > In https://reviews.llvm.org/D31885#730920, @dberlin wrote:
> >
> > > Just so i understand: Ignoring everything else (we can't actually make likelyalias work, i think the code in the bugs makes that very clear),
> >
> >
> > Are you under the impression that I'm proposing something new and that TBAA does not currently defer to BasicAA?
>
>
> Well, actually, chandler changed how it works, so it doesn't actually work quite the way it used to but it has the same effect ;)
>  (It used to be explicit deferral, it is not explicit, neither depends on the other any more.  It just happens to be how the default AA pipeline is ordered right now)
>  I'm quite aware of the machinations of AA in LLVM :)
>
> I'm saying "this approach does not work well now" (ie see bugs) , and i don't think continuing to try to make the contract even *more* incestuous is better than trying to make it *less*.

What I'm trying to get across is that you're not going to be able to completely eliminate the "incest" without making TBAA substantially more aggressive, in ways that aren't really acceptable and definitely not consistent with how we've always described LLVM's TBAA to users.  I certainly would not feel comfortable ever switching Clang by default to a GCC-like model which will literally miscompile something as simple as:

void foo(float *f) -> int {

  *f = /*something*/;
  return *(int*) f;

}

If you acknowledge that, then we can legitimately talk about the right way to get the behavior that we want.  I would be fine with making that conservatism opt-in somehow, for example, especially if (as you suggest) we could use that to signal to BasicAA that it should not impose artificial limits on differentiating LikelyAlias from MayAlias.  (I still think that that would be useful to allow this kind of LikelyAlias result, at least for TBAA's use.  For example, the double* in the printf loop in the union-tbaa2.cpp test doesn't definitively overlap either store, but it's clearly still related enough that conservative-TBAA should be suppressed.)

> You seem to claim better optimization opportunity.
>  I ran numbers.
>  Across all things in llvm's benchmark suite, disabling unions entirely for tbaa causes no regressions on x86 (I could try more platforms if you like).
> 
> I also  ran it across 50 third party packages and spec2006 that use strict aliasing (These are just those things i could quickly test that had perf benchmarks)

This is interesting information, thank you.  One obvious concern is that it almost certainly under-represents unions, which tend to be uncommon in benchmarks; benchmark swifts have also often be explicitly massaged to remove anything that could be described undefined behavior, which again is not consistent with the real world.  But it does suggest that we could try it.

> If it really caused any real perf loss, we could always define metadata to say these accesses are noalias if we can't prove they are mustalias.
>  We could even use that metadata to explicitly try to have basicaa try much harder than it does now to prove must-aliasing (for example, ignore depth limits), since it would likely be too expensive to try it for everything.

Yes, this seems like a direction we should be taking.

> as an aside, I do not believe you could reasonably apply this to unions  because it's not possible to tell people what to do differently when it's *llvm ir* that has the issue.
> 
> IE i can't reasonably tell users "please stop doing things that don't generate explicit enough IR".
>  Especially when those answers change with inlining decisions, etc.

Specific simple patterns in the source language will reliably turn into specific patterns in the IR that we can guarantee are explicit enough.  If you get a pointer value from some opaque source, assign it to a local variable, and then use it multiple times, the optimizer will reliably see that the pointer is the same at each use.  Similarly if the pointer is just a local variable itself.  That's basically all I would guarantee as "sufficiently obvious": the pointers are obviously derived from the same source.

In these test cases, the accesses happen from different functions; but in order to be miscompiled they have to get inlined into one (unless we added some sort of cross-function AA, I suppose), which still admits the same kind of trivial pointer analysis.

John.

Repository:
  rL LLVM

https://reviews.llvm.org/D31885