[llvm-dev] [canonicalization] GEP 0, 0

Sat Dec 24 00:39:20 PST 2016

On Fri, Dec 23, 2016 at 10:17 PM Daniel Berlin <dberlin at dberlin.org> wrote:

On Fri, Dec 23, 2016 at 10:02 PM, Chandler Carruth <chandlerc at google.com>
wrote:

On Fri, Dec 23, 2016 at 3:30 PM Daniel Berlin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

On Fri, Dec 23, 2016 at 2:31 PM, Piotr Padlewski via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

2016-12-23 22:17 GMT+01:00 David Majnemer <david.majnemer at gmail.com>:

On Fri, Dec 23, 2016 at 1:09 PM, Davide Italiano via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

On Fri, Dec 23, 2016 at 1:01 PM, Piotr Padlewski
<piotr.padlewski at gmail.com> wrote:
>
>
> On Dec 23, 2016 19:47, "Daniel Berlin" <dberlin at dberlin.org> wrote:
>
> Define soon?
> My guess is 1 year or less.
> (I've already seen patches to start converting most remaining memdep uses,
> like memcpy opt, licm, etc)
>
>
> That's good. Anyway I already have a patch that is doing invariant group
> dependence across BBs, so I guess it make sense to push it upstream to
push
> the bar higher.
> But I think we are getting a little bit of topic - should gep 0 be
> canonicalized to bitcast?
>

Are memdep/memssa the only possible passes that could benefit from
such a canonicalization or you can think of other cases when this can
be useful?

We already canonicalize.  We canonicalize in the other direction:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L2024

Intresting. So what is the right solution here? I can easily add handling
of gep 0 to GVN, or maybe the code that you mentioned should be in SROA.
If SROA is the only user of this transformation and I there are quiet a few
passes that it hurts, then I would propose moving this logic to SROA and
always canonicalize gep 0 to bitcast.

+1

This canonicalization seems like it's unlikely to help other passes, and
definitely makes handling more complicated elsewhere.

-1
;]

I think it is very useful to be able to canonicalize on GEPs rather than
bitcasts.

Based on what?

TBH, no hard data at all. Just my experience poking at passes that cared
about this and the advice I received from others when hacking on them.
Sorry if anything I wrote came off as some kind of firm, or unwavering
position, it was more that I'm not sure the proposed change is good. It
might be, but I'm not sure yet. See below for more details though.

I would teach things to understand GEPs with zero indices, it's pretty easy
(we have tools to automate it).

So your position is that we should teach literally everything to understand
something new, instead of canonicalize in the direction literally
everything understands already?

Not at all.

All-zero-GEPs aren't new for better or worse. We've been canonicalizing
toward them since before Chris refactored all of this code in 2007, so this
is a very long-standing pattern even if it is a bad one. And several parts
of LLVM of course handle them. They have to given that we're canonicalizing
that direction.

I learned of this years ago when Duncan Sands taught me about it to explain
why my SROA rewrite hit so many all-zero GEPs. Since then, it hasn't seemed
to me personally like a significant issue that we canonicalize towards
all-zero GEPs. And I've not heard many folks hit issues with it since then.
So my advice is typically "yep, you need to handle all-zero GEPs". That's
essentially the default for when a pass doesn't handle canonical form.

It isn't an endorsement though, and it doesn't mean we can't or shouldn't
change the canonical form! If we have new evidence that makes a change
warranted, we absolutely should. Some kinds of evidence I can remember in
the past motivating a change:
1) It's really hard and/or awkward to teach things about the canonical form.
2) The canonical form doesn't express as much information or is in any
other way a less useful/expressive representation.
3) Empirical evidence shows that a different canonical form despite being
perfectly equivalent on the prior two points, is less convenient.

The statement from Piotr that it is easy to teach GVN about this seems to
indicate we're not talking about the first two issues, but if we are,
that's a whole new discussion (and an interesting one).

It seems possible that #3 is true (I suspect you think #3 is true from your
email at least). While we should in general fix these kinds of issues when
we find them, it does seem somewhat less urgent. And since there are
conflicting experiences, we probably want at least a comprehensive survey
before we make a change.

In this particular case I suspect that we *used* to have issues related to
#2 -- SROA before my rewrite relied on GEPs to understand type structures
being decomposed. While I *think* all of the semantic issues here are gone,
a number of people in the last couple of years have still insisted that
bitcasting pointers blocks optimizations so we should at least investigate
this prior to making a change.

But this led me to the last paragraph in my email -- if all of this goes
away with typeless pointers, it's not clear that it's worth pursuing a
change for #3. Not saying we definitely shouldn't, just that we should
weigh that against removing types from pointers entirely so that these
issues don't come at all, regardless of how we spell the instruction.

One reason why is that most things that want to reason about all-zero GEPs
probably want to reason about all-constant GEPs as well.

This is about the equivalence of gep 0,0 to bitcast. They aren't equivalent
to other types of geps, so reasoning about them seems very different and
orthogonal to me.
You are saying "well things *should* understand all constant GEP anyway".
That may or may not be true, but it's, IMHO, pretty orthogonal to how we
choose to canonicalize a given operation.
You are also suggesting representing an operation in something
significantly more complicated than it's direct form.

Sorry it seemed orthogonal. All I meant was that handling all zero GEPs
might be unusually low cost because of handling all constant GEPs. That
only really speaks to #3 above, and it really only lessens the cost. As you
say, it can't eliminate it as a GEP is different from a bitcast and so we
may end up handling both.

Also, can you provide some data?
You say "most" and "probably" with no explicit examples.
Can you pull out an explicit example of how it would help existing passes,
both optimization, and analysis, vs canonicalizing to bitcast?

Well, my intent here was not to say it would *help* existing passes but
only that several existing passes already handled all constant GEPs and the
all zero case largely fell out as a consequence.

The example I'm most familiar with is SROA of course. In all but one place
it uses code to handle all constant GEPs and doesn't need to special case
all zero GEPs. BasicAA appears to be similar. The vectorizers also seem to
have existing code handling constant GEPs that would handle zero GEPs for
free.

As for examples of where it would *help* in a fundamental way? No, I think
all of those are gone. It used to help SROA before the rewrite. It also
helped DependenceAnalysis before it was rewritten. The first was crippled
by relying on this but remaining conservatively correct. The second was
actually buggy because it relied on this without remaining conservatively
correct. LLVM has been moving away from this being a useful thing to
*fundamentally* rely on.

Examples of where it would help in a trivial way? Any pass that handles
all-constant-GEPs, but not bitcasts. We could easily teach such a pass to
handle bitcasts though. We could also teach the passes that handle bitcasts
to handle all-zero-GEPs. In fact, we've automated this for most passes with
stripPointerCasts and friends.

I don't see a fundamental advantage of one over the other, so we're left
with essentially an engineering tradeoff. If we weren't planning to remove
pointer types entirely, this would still be an important engineering
tradeoff, but I'm not sure which way it would go. Given that we're planning
to remove pointer types entirely, I'd rather focus on that change rather
than changing canonicalization strategies, and patch passes to cope with
today's canonical form until we finish. But that is a fairly mild "rather".
New information could quite easily change my mind. And it is just my two
cents of course.

Anyways, again, sorry if my previous email came off as a mandate, I just
meant to indicate that the issue was not clear cut to me, not that it was
some kind of definite thing one way or the other.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161224/5d890449/attachment-0001.html>