[llvm-dev] [RFC] Simple GVN hoist

Momchil Velikov via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 15 12:29:00 PDT 2021


On Tue, Sep 14, 2021 at 8:12 PM Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Having said that, *conceptually* this pass takes the proper approach by using MemorySSA.  Your current implementation has a bunch of complexity to basically do a local renumbering of memory instructions.  Some of that can be improved (see below), but if you were doing this with MemorySSA, you wouldn't need to do anything fancy as GVN would have numbered the loads correctly.

Do you mean the NewGVN will number loads and store correctly? The old
GVN will have unique numbers for loads and stores.

> Phabricator review is at https://reviews.llvm.org/D109760.
>
> The current structure of algorithm you proposed can probably be simplified.  Here's what I'd suggest:
>
> Start by handling all of the non-memory instructions.  This should be as simple as constructing a hash map from value number to instruction for one block, and then scanning the other block.  If you find the same value number in both blocks, the (by assumption scalar) instruction must exist in both blocks and can be hoisted.  This can be reviewed, tested, submitted, etc..
>
> Then add a single pre-pass which "renumbers" all of the memory instructions.  You can either use a trivial "no clobbers seen" like EarlyCSE does, or only handle loads where MD->getDependency() is nonlocal.  (We already do this query in processLoad, so it should be cached.)  The same algorithm can be used as above, all that changes is where the valuenumber comes from.  (The "pre-pass" can probably be implemented as a helper function which renumbers memory instructions on demand.)

The issue here is that uniqueness of load numbers propagates to the
users of those loads, so just after we've hoisted and merged a pair of
loads into one instruction we can potentially get the same VN in that
load's users.
I was handling that with a followup iteration, but indeed we could
recursively traverse the users, starting from the load, and recompute
VNs.

> I would specifically *not* add an extra layer of iteration here.  If you find you need iteration to handle the cases you care about, I suspect we should actually be adjusting the value re-numbering scheme for loads described just above.
>
> A couple of assorted notes:
>
> MemDep should be responsible for handling all aliasing barriers.  If it returns a non-local dependence, reordering that instruction to the beginning of the block should be legal w.r.t. memory.  Your use of isHoistBarrier is confusing to me.  You may be trying to handle speculation safety (i.e. avoid introducing faults), but if so, the special casing around volatile and atomics seems odd?  (Also, why allow hosting of the barrier instruction?)  I suspect there's a particular case you're trying to handle.  I would strongly advice dropping that from the initial patch, then add back the complexity in a separate change which is well motivated with tests, etc..

With regard to volatile and the atomics, I guess I was being too
conservartive trying to not reorder *any* instructions above them. I
can remove those conditions. The other condition is to not
speculatively execute instructions by moving them across an
instruction that may throw, e.g. with

    VN 9:   br i1 %tobool, label %if.then, label %if.else
    if.then:
    VN 11:   call void @h()
    VN 7:   %0 = sdiv i32 %a, %b
     ...
    if.else:
    VN 7:   %1 = sdiv i32 %a, %b
    ...

we don't want to end up with:

    VN 7:   %0 = sdiv i32 %a, %b
    br i1 %tobool, label %if.then, label %if.else
    if.then:
    VN 11:   call void @h()
     ...
    if.else:
    VN 7:   %1 = sdiv i32 %a, %b
    ...


but with (`h` is `readnone`):

    VN 9:   br i1 %tobool, label %if.then, label %if.else
    if.then:
    VN 11:   call void @h()
    VN 7:   %0 = sdiv i32 %a, %b
    ...
    if.else:
    VN 11:   call void @h()
    VN 7:   %1 = sdiv i32 %a, %b
    ...

we could move both the call  and the division, while preserving their
order, which also answers "Also, why allow hosting of the barrier
instruction?")

So, the subsequent iterations of the algorithm were designed to:
a) handle renumbering of loads and their users (which could possibly
be dealt with as described above)
b) to prevent all reordering across volatiles and atomics (I'll drop
that and prevent reordering only when MemDep says there is a
dependency).
c) detect when hoisting an instruction turns it from a local
dependency into a non-local one, thus potentially allowing more
hoisting; I'll see about explicitly checking if two matching
instructions have a local dependency on
a couple if instructions that are themselves paired for hoisting.
d) prevent speculation across throwing instructions; I don't have a
nice non-iterative solution for that, e.g. in the example above there
is no dependency between the call and the division.

> In terms of the actual merge logic, I'd suggest implementing it as hoist-one, then CSE.  That would allow you to reuse the existing patching logic and avoid reimplementing a bunch of complexity.

I'm not sure I understand this. Currently I move one instruction, then
replace all uses of the other with the first one. Do you mean that or
something else?

Thanks for the comments and the suggestions, I now have a bunch of new
ideas to try!

Thanks and best regards,
~chill


More information about the llvm-dev mailing list