[llvm-dev] [canonicalization] GEP 0, 0

Tue Dec 27 13:12:57 PST 2016

On Tue, Dec 27, 2016 at 12:32 PM Piotr Padlewski <piotr.padlewski at gmail.com>
wrote:

> BTW are we planning to also remove gep 0 after we would have typeless
> pointers?
> I saw the David's presentation year ago but I am not sure if it was
> mentioned.
>

Wasn't mentioned, but I would imagine gep of constant zero would go away -
in the same way add of constant zero presumably gets optimized away/is not
canonical today.

> If it would be removed, then does it mean that SROA would still need some
> new code to handle it (assuming it doesn't handle bitcasts right now)
>

Right - those sort of fixes to existing optimizations are both part of the
point of the typeless pointer work (to force those kinds of dependences out
of optimizations) and a lot of the remaining work (I got most of the
bitcode/IR changes done by the end of last year - couple more to do
(byval/inalloca at least), but beyond that, it's fixing all the
optimization passes to handle such a future, etc)

>
> Here is the patch solving my issue https://reviews.llvm.org/D28126
>
> 2016-12-25 13:40 GMT+01:00 Piotr Padlewski <piotr.padlewski at gmail.com>:
>
>
>
> 2016-12-24 9:39 GMT+01:00 Chandler Carruth <chandlerc at google.com>:
>
> On Fri, Dec 23, 2016 at 10:17 PM Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
> On Fri, Dec 23, 2016 at 10:02 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
> On Fri, Dec 23, 2016 at 3:30 PM Daniel Berlin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Dec 23, 2016 at 2:31 PM, Piotr Padlewski via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> 2016-12-23 22:17 GMT+01:00 David Majnemer <david.majnemer at gmail.com>:
>
>
>
> On Fri, Dec 23, 2016 at 1:09 PM, Davide Italiano via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Dec 23, 2016 at 1:01 PM, Piotr Padlewski
> <piotr.padlewski at gmail.com> wrote:
> >
> >
> > On Dec 23, 2016 19:47, "Daniel Berlin" <dberlin at dberlin.org> wrote:
> >
> > Define soon?
> > My guess is 1 year or less.
> > (I've already seen patches to start converting most remaining memdep
> uses,
> > like memcpy opt, licm, etc)
> >
> >
> > That's good. Anyway I already have a patch that is doing invariant group
> > dependence across BBs, so I guess it make sense to push it upstream to
> push
> > the bar higher.
> > But I think we are getting a little bit of topic - should gep 0 be
> > canonicalized to bitcast?
> >
>
> Are memdep/memssa the only possible passes that could benefit from
> such a canonicalization or you can think of other cases when this can
> be useful?
>
>
> We already canonicalize.  We canonicalize in the other direction:
> https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L2024
>
>
> Intresting. So what is the right solution here? I can easily add handling
> of gep 0 to GVN, or maybe the code that you mentioned should be in SROA.
> If SROA is the only user of this transformation and I there are quiet a
> few passes that it hurts, then I would propose moving this logic to SROA
> and always canonicalize gep 0 to bitcast.
>
>
> +1
>
> This canonicalization seems like it's unlikely to help other passes, and
> definitely makes handling more complicated elsewhere.
>
>
> -1
> ;]
>
> I think it is very useful to be able to canonicalize on GEPs rather than
> bitcasts.
>
>
> Based on what?
>
>
> TBH, no hard data at all. Just my experience poking at passes that cared
> about this and the advice I received from others when hacking on them.
> Sorry if anything I wrote came off as some kind of firm, or unwavering
> position, it was more that I'm not sure the proposed change is good. It
> might be, but I'm not sure yet. See below for more details though.
>
>
>
>
>
> I would teach things to understand GEPs with zero indices, it's pretty
> easy (we have tools to automate it).
>
>
> So your position is that we should teach literally everything to
> understand something new, instead of canonicalize in the direction
> literally everything understands already?
>
>
> Not at all.
>
> All-zero-GEPs aren't new for better or worse. We've been canonicalizing
> toward them since before Chris refactored all of this code in 2007, so this
> is a very long-standing pattern even if it is a bad one. And several parts
> of LLVM of course handle them. They have to given that we're canonicalizing
> that direction.
>
> I learned of this years ago when Duncan Sands taught me about it to
> explain why my SROA rewrite hit so many all-zero GEPs. Since then, it
> hasn't seemed to me personally like a significant issue that we
> canonicalize towards all-zero GEPs. And I've not heard many folks hit
> issues with it since then. So my advice is typically "yep, you need to
> handle all-zero GEPs". That's essentially the default for when a pass
> doesn't handle canonical form.
>
> It isn't an endorsement though, and it doesn't mean we can't or shouldn't
> change the canonical form! If we have new evidence that makes a change
> warranted, we absolutely should. Some kinds of evidence I can remember in
> the past motivating a change:
> 1) It's really hard and/or awkward to teach things about the canonical
> form.
> 2) The canonical form doesn't express as much information or is in any
> other way a less useful/expressive representation.
> 3) Empirical evidence shows that a different canonical form despite being
> perfectly equivalent on the prior two points, is less convenient.
>
> The statement from Piotr that it is easy to teach GVN about this seems to
> indicate we're not talking about the first two issues, but if we are,
> that's a whole new discussion (and an interesting one).
>
> It seems possible that #3 is true (I suspect you think #3 is true from
> your email at least). While we should in general fix these kinds of issues
> when we find them, it does seem somewhat less urgent. And since there are
> conflicting experiences, we probably want at least a comprehensive survey
> before we make a change.
>
> In this particular case I suspect that we *used* to have issues related to
> #2 -- SROA before my rewrite relied on GEPs to understand type structures
> being decomposed. While I *think* all of the semantic issues here are gone,
> a number of people in the last couple of years have still insisted that
> bitcasting pointers blocks optimizations so we should at least investigate
> this prior to making a change.
>
> But this led me to the last paragraph in my email -- if all of this goes
> away with typeless pointers, it's not clear that it's worth pursuing a
> change for #3. Not saying we definitely shouldn't, just that we should
> weigh that against removing types from pointers entirely so that these
> issues don't come at all, regardless of how we spell the instruction.
>
>
>
> One reason why is that most things that want to reason about all-zero GEPs
> probably want to reason about all-constant GEPs as well.
>
> This is about the equivalence of gep 0,0 to bitcast. They aren't
> equivalent to other types of geps, so reasoning about them seems very
> different and orthogonal to me.
> You are saying "well things *should* understand all constant GEP anyway".
> That may or may not be true, but it's, IMHO, pretty orthogonal to how we
> choose to canonicalize a given operation.
> You are also suggesting representing an operation in something
> significantly more complicated than it's direct form.
>
>
> Sorry it seemed orthogonal. All I meant was that handling all zero GEPs
> might be unusually low cost because of handling all constant GEPs. That
> only really speaks to #3 above, and it really only lessens the cost. As you
> say, it can't eliminate it as a GEP is different from a bitcast and so we
> may end up handling both.
>
>
> Also, can you provide some data?
> You say "most" and "probably" with no explicit examples.
> Can you pull out an explicit example of how it would help existing passes,
> both optimization, and analysis, vs canonicalizing to bitcast?
>
>
> Well, my intent here was not to say it would *help* existing passes but
> only that several existing passes already handled all constant GEPs and the
> all zero case largely fell out as a consequence.
>
> The example I'm most familiar with is SROA of course. In all but one place
> it uses code to handle all constant GEPs and doesn't need to special case
> all zero GEPs. BasicAA appears to be similar. The vectorizers also seem to
> have existing code handling constant GEPs that would handle zero GEPs for
> free.
>
> As for examples of where it would *help* in a fundamental way? No, I think
> all of those are gone. It used to help SROA before the rewrite. It also
> helped DependenceAnalysis before it was rewritten. The first was crippled
> by relying on this but remaining conservatively correct. The second was
> actually buggy because it relied on this without remaining conservatively
> correct. LLVM has been moving away from this being a useful thing to
> *fundamentally* rely on.
>
> Examples of where it would help in a trivial way? Any pass that handles
> all-constant-GEPs, but not bitcasts. We could easily teach such a pass to
> handle bitcasts though. We could also teach the passes that handle bitcasts
> to handle all-zero-GEPs. In fact, we've automated this for most passes with
> stripPointerCasts and friends.
>
>
> I don't see a fundamental advantage of one over the other, so we're left
> with essentially an engineering tradeoff. If we weren't planning to remove
> pointer types entirely, this would still be an important engineering
> tradeoff, but I'm not sure which way it would go. Given that we're planning
> to remove pointer types entirely, I'd rather focus on that change rather
> than changing canonicalization strategies, and patch passes to cope with
> today's canonical form until we finish. But that is a fairly mild "rather".
> New information could quite easily change my mind. And it is just my two
> cents of course.
>
>
> Anyways, again, sorry if my previous email came off as a mandate, I just
> meant to indicate that the issue was not clear cut to me, not that it was
> some kind of definite thing one way or the other.
>
>
> Ok I get your point.
> - If every gep 0, 0 would be bitcast, then we would still need code to
> handle all constant geps that would handle this case, but we also need code
> to handle bitcasts that will be gone some time
> - if we stay with gep 0, 0 then places like GVN that doesn't handle it
> right now should probably handle all constant geps anyway, so it doesn't
> matter in which form it is.
>
> I didn't investigate how hard it would be to handle geps 0 or constant
> geps in GVN, but at least for handling of invariant groups it required 2
> lines of code. I did look a little bit into SROA and I don't see a clear
> solution to handle bitcasts for SROA yet, so it also might be argument
> against bitcasts (but I also not familiar with SROA and didn't spend enough
> time on it).
> When do we expect to deploy typeless pointers?
>
> @dannyb are you fine with me writing this small hotfix for invariant
> group?
>
> Piotr
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161227/7dda6b70/attachment-0001.html>