[llvm-dev] [canonicalization] GEP 0, 0

Sun Dec 25 04:40:59 PST 2016

2016-12-24 9:39 GMT+01:00 Chandler Carruth <chandlerc at google.com>:

> On Fri, Dec 23, 2016 at 10:17 PM Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
> On Fri, Dec 23, 2016 at 10:02 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
> On Fri, Dec 23, 2016 at 3:30 PM Daniel Berlin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Dec 23, 2016 at 2:31 PM, Piotr Padlewski via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> 2016-12-23 22:17 GMT+01:00 David Majnemer <david.majnemer at gmail.com>:
>
>
>
> On Fri, Dec 23, 2016 at 1:09 PM, Davide Italiano via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Dec 23, 2016 at 1:01 PM, Piotr Padlewski
> <piotr.padlewski at gmail.com> wrote:
> >
> >
> > On Dec 23, 2016 19:47, "Daniel Berlin" <dberlin at dberlin.org> wrote:
> >
> > Define soon?
> > My guess is 1 year or less.
> > (I've already seen patches to start converting most remaining memdep
> uses,
> > like memcpy opt, licm, etc)
> >
> >
> > That's good. Anyway I already have a patch that is doing invariant group
> > dependence across BBs, so I guess it make sense to push it upstream to
> push
> > the bar higher.
> > But I think we are getting a little bit of topic - should gep 0 be
> > canonicalized to bitcast?
> >
>
> Are memdep/memssa the only possible passes that could benefit from
> such a canonicalization or you can think of other cases when this can
> be useful?
>
>
> We already canonicalize.  We canonicalize in the other direction:
> https://github.com/llvm-mirror/llvm/blob/master/
> lib/Transforms/InstCombine/InstCombineCasts.cpp#L2024
>
>
> Intresting. So what is the right solution here? I can easily add handling
> of gep 0 to GVN, or maybe the code that you mentioned should be in SROA.
> If SROA is the only user of this transformation and I there are quiet a
> few passes that it hurts, then I would propose moving this logic to SROA
> and always canonicalize gep 0 to bitcast.
>
>
> +1
>
> This canonicalization seems like it's unlikely to help other passes, and
> definitely makes handling more complicated elsewhere.
>
>
> -1
> ;]
>
> I think it is very useful to be able to canonicalize on GEPs rather than
> bitcasts.
>
>
> Based on what?
>
>
> TBH, no hard data at all. Just my experience poking at passes that cared
> about this and the advice I received from others when hacking on them.
> Sorry if anything I wrote came off as some kind of firm, or unwavering
> position, it was more that I'm not sure the proposed change is good. It
> might be, but I'm not sure yet. See below for more details though.
>
>
>
>
>
> I would teach things to understand GEPs with zero indices, it's pretty
> easy (we have tools to automate it).
>
>
> So your position is that we should teach literally everything to
> understand something new, instead of canonicalize in the direction
> literally everything understands already?
>
>
> Not at all.
>
> All-zero-GEPs aren't new for better or worse. We've been canonicalizing
> toward them since before Chris refactored all of this code in 2007, so this
> is a very long-standing pattern even if it is a bad one. And several parts
> of LLVM of course handle them. They have to given that we're canonicalizing
> that direction.
>
> I learned of this years ago when Duncan Sands taught me about it to
> explain why my SROA rewrite hit so many all-zero GEPs. Since then, it
> hasn't seemed to me personally like a significant issue that we
> canonicalize towards all-zero GEPs. And I've not heard many folks hit
> issues with it since then. So my advice is typically "yep, you need to
> handle all-zero GEPs". That's essentially the default for when a pass
> doesn't handle canonical form.
>
> It isn't an endorsement though, and it doesn't mean we can't or shouldn't
> change the canonical form! If we have new evidence that makes a change
> warranted, we absolutely should. Some kinds of evidence I can remember in
> the past motivating a change:
> 1) It's really hard and/or awkward to teach things about the canonical
> form.
> 2) The canonical form doesn't express as much information or is in any
> other way a less useful/expressive representation.
> 3) Empirical evidence shows that a different canonical form despite being
> perfectly equivalent on the prior two points, is less convenient.
>
> The statement from Piotr that it is easy to teach GVN about this seems to
> indicate we're not talking about the first two issues, but if we are,
> that's a whole new discussion (and an interesting one).
>
> It seems possible that #3 is true (I suspect you think #3 is true from
> your email at least). While we should in general fix these kinds of issues
> when we find them, it does seem somewhat less urgent. And since there are
> conflicting experiences, we probably want at least a comprehensive survey
> before we make a change.
>
> In this particular case I suspect that we *used* to have issues related to
> #2 -- SROA before my rewrite relied on GEPs to understand type structures
> being decomposed. While I *think* all of the semantic issues here are gone,
> a number of people in the last couple of years have still insisted that
> bitcasting pointers blocks optimizations so we should at least investigate
> this prior to making a change.
>
> But this led me to the last paragraph in my email -- if all of this goes
> away with typeless pointers, it's not clear that it's worth pursuing a
> change for #3. Not saying we definitely shouldn't, just that we should
> weigh that against removing types from pointers entirely so that these
> issues don't come at all, regardless of how we spell the instruction.
>
>
>
> One reason why is that most things that want to reason about all-zero GEPs
> probably want to reason about all-constant GEPs as well.
>
> This is about the equivalence of gep 0,0 to bitcast. They aren't
> equivalent to other types of geps, so reasoning about them seems very
> different and orthogonal to me.
> You are saying "well things *should* understand all constant GEP anyway".
> That may or may not be true, but it's, IMHO, pretty orthogonal to how we
> choose to canonicalize a given operation.
> You are also suggesting representing an operation in something
> significantly more complicated than it's direct form.
>
>
> Sorry it seemed orthogonal. All I meant was that handling all zero GEPs
> might be unusually low cost because of handling all constant GEPs. That
> only really speaks to #3 above, and it really only lessens the cost. As you
> say, it can't eliminate it as a GEP is different from a bitcast and so we
> may end up handling both.
>
>
> Also, can you provide some data?
> You say "most" and "probably" with no explicit examples.
> Can you pull out an explicit example of how it would help existing passes,
> both optimization, and analysis, vs canonicalizing to bitcast?
>
>
> Well, my intent here was not to say it would *help* existing passes but
> only that several existing passes already handled all constant GEPs and the
> all zero case largely fell out as a consequence.
>
> The example I'm most familiar with is SROA of course. In all but one place
> it uses code to handle all constant GEPs and doesn't need to special case
> all zero GEPs. BasicAA appears to be similar. The vectorizers also seem to
> have existing code handling constant GEPs that would handle zero GEPs for
> free.
>
> As for examples of where it would *help* in a fundamental way? No, I think
> all of those are gone. It used to help SROA before the rewrite. It also
> helped DependenceAnalysis before it was rewritten. The first was crippled
> by relying on this but remaining conservatively correct. The second was
> actually buggy because it relied on this without remaining conservatively
> correct. LLVM has been moving away from this being a useful thing to
> *fundamentally* rely on.
>
> Examples of where it would help in a trivial way? Any pass that handles
> all-constant-GEPs, but not bitcasts. We could easily teach such a pass to
> handle bitcasts though. We could also teach the passes that handle bitcasts
> to handle all-zero-GEPs. In fact, we've automated this for most passes with
> stripPointerCasts and friends.
>
>
> I don't see a fundamental advantage of one over the other, so we're left
> with essentially an engineering tradeoff. If we weren't planning to remove
> pointer types entirely, this would still be an important engineering
> tradeoff, but I'm not sure which way it would go. Given that we're planning
> to remove pointer types entirely, I'd rather focus on that change rather
> than changing canonicalization strategies, and patch passes to cope with
> today's canonical form until we finish. But that is a fairly mild "rather".
> New information could quite easily change my mind. And it is just my two
> cents of course.
>
>
> Anyways, again, sorry if my previous email came off as a mandate, I just
> meant to indicate that the issue was not clear cut to me, not that it was
> some kind of definite thing one way or the other.
>

Ok I get your point.
- If every gep 0, 0 would be bitcast, then we would still need code to
handle all constant geps that would handle this case, but we also need code
to handle bitcasts that will be gone some time
- if we stay with gep 0, 0 then places like GVN that doesn't handle it
right now should probably handle all constant geps anyway, so it doesn't
matter in which form it is.

I didn't investigate how hard it would be to handle geps 0 or constant geps
in GVN, but at least for handling of invariant groups it required 2 lines
of code. I did look a little bit into SROA and I don't see a clear solution
to handle bitcasts for SROA yet, so it also might be argument against
bitcasts (but I also not familiar with SROA and didn't spend enough time on
it).
When do we expect to deploy typeless pointers?

@dannyb are you fine with me writing this small hotfix for invariant group?

Piotr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161225/b7acb307/attachment.html>