[llvm-dev] [RFC] Simple GVN hoist

Philip Reames via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 16 11:00:07 PDT 2021


On 9/15/21 12:29 PM, Momchil Velikov wrote:
> On Tue, Sep 14, 2021 at 8:12 PM Philip Reames via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Having said that, *conceptually* this pass takes the proper approach by using MemorySSA.  Your current implementation has a bunch of complexity to basically do a local renumbering of memory instructions.  Some of that can be improved (see below), but if you were doing this with MemorySSA, you wouldn't need to do anything fancy as GVN would have numbered the loads correctly.
> Do you mean the NewGVN will number loads and store correctly? The old
> GVN will have unique numbers for loads and stores.
It should.  This is the whole idea behind a memoryssa based GVN.
>
>> Phabricator review is at https://reviews.llvm.org/D109760.
>>
>> The current structure of algorithm you proposed can probably be simplified.  Here's what I'd suggest:
>>
>> Start by handling all of the non-memory instructions.  This should be as simple as constructing a hash map from value number to instruction for one block, and then scanning the other block.  If you find the same value number in both blocks, the (by assumption scalar) instruction must exist in both blocks and can be hoisted.  This can be reviewed, tested, submitted, etc..
>>
>> Then add a single pre-pass which "renumbers" all of the memory instructions.  You can either use a trivial "no clobbers seen" like EarlyCSE does, or only handle loads where MD->getDependency() is nonlocal.  (We already do this query in processLoad, so it should be cached.)  The same algorithm can be used as above, all that changes is where the valuenumber comes from.  (The "pre-pass" can probably be implemented as a helper function which renumbers memory instructions on demand.)
> The issue here is that uniqueness of load numbers propagates to the
> users of those loads, so just after we've hoisted and merged a pair of
> loads into one instruction we can potentially get the same VN in that
> load's users.

Your missing my point.  The renumbering scheme would explicitly consider 
the incoming memory state to the load as an input to the hash.  It would 
not uniquely number the loads.  As a result, two loads (in either 
different blocks or even the same one) would be assigned the same value 
number if all operands (including memory state!) were the same.

> I was handling that with a followup iteration, but indeed we could
> recursively traverse the users, starting from the load, and recompute
> VNs.
>
>> I would specifically *not* add an extra layer of iteration here.  If you find you need iteration to handle the cases you care about, I suspect we should actually be adjusting the value re-numbering scheme for loads described just above.
>>
>> A couple of assorted notes:
>>
>> MemDep should be responsible for handling all aliasing barriers.  If it returns a non-local dependence, reordering that instruction to the beginning of the block should be legal w.r.t. memory.  Your use of isHoistBarrier is confusing to me.  You may be trying to handle speculation safety (i.e. avoid introducing faults), but if so, the special casing around volatile and atomics seems odd?  (Also, why allow hosting of the barrier instruction?)  I suspect there's a particular case you're trying to handle.  I would strongly advice dropping that from the initial patch, then add back the complexity in a separate change which is well motivated with tests, etc..
> With regard to volatile and the atomics, I guess I was being too
> conservartive trying to not reorder *any* instructions above them. I
> can remove those conditions. The other condition is to not
> speculatively execute instructions by moving them across an
> instruction that may throw, e.g. with
>
>      VN 9:   br i1 %tobool, label %if.then, label %if.else
>      if.then:
>      VN 11:   call void @h()
>      VN 7:   %0 = sdiv i32 %a, %b
>       ...
>      if.else:
>      VN 7:   %1 = sdiv i32 %a, %b
>      ...
>
> we don't want to end up with:
>
>      VN 7:   %0 = sdiv i32 %a, %b
>      br i1 %tobool, label %if.then, label %if.else
>      if.then:
>      VN 11:   call void @h()
>       ...
>      if.else:
>      VN 7:   %1 = sdiv i32 %a, %b
>      ...
>
>
> but with (`h` is `readnone`):
>
>      VN 9:   br i1 %tobool, label %if.then, label %if.else
>      if.then:
>      VN 11:   call void @h()
>      VN 7:   %0 = sdiv i32 %a, %b
>      ...
>      if.else:
>      VN 11:   call void @h()
>      VN 7:   %1 = sdiv i32 %a, %b
>      ...
>
> we could move both the call  and the division, while preserving their
> order, which also answers "Also, why allow hosting of the barrier
> instruction?")
>
> So, the subsequent iterations of the algorithm were designed to:
> a) handle renumbering of loads and their users (which could possibly
> be dealt with as described above)
> b) to prevent all reordering across volatiles and atomics (I'll drop
> that and prevent reordering only when MemDep says there is a
> dependency).
> c) detect when hoisting an instruction turns it from a local
> dependency into a non-local one, thus potentially allowing more
> hoisting; I'll see about explicitly checking if two matching
> instructions have a local dependency on
> a couple if instructions that are themselves paired for hoisting.
The numbering scheme should handle this.
> d) prevent speculation across throwing instructions; I don't have a
> nice non-iterative solution for that, e.g. in the example above there
> is no dependency between the call and the division.
As you walk forward, if you encounter an instruction not guaranteed to 
transfer execution, only hash instructions which are safe to speculate 
into the header.  You can no longer use guarantee-to-execute fault 
safety past that point.
>
>> In terms of the actual merge logic, I'd suggest implementing it as hoist-one, then CSE.  That would allow you to reuse the existing patching logic and avoid reimplementing a bunch of complexity.
> I'm not sure I understand this. Currently I move one instruction, then
> replace all uses of the other with the first one. Do you mean that or
> something else?
I mean specifically, you should be able to reuse the patchReplacement 
logic rather than reimplementing all of the flag merging logic.
>
> Thanks for the comments and the suggestions, I now have a bunch of new
> ideas to try!
>
> Thanks and best regards,
> ~chill


More information about the llvm-dev mailing list