[LLVMdev] Fence elimination pass proposal

Mon Sep 22 00:31:33 PDT 2014

Hi Robin,

I'm not an expert on fences, but I did help examine the original patch
(Reinoud's). Some comments inline.

On 11 September 2014 15:46, Robin Morisset <morisset at google.com> wrote:
> - Should I try to make the code general enough to be used for full PRE
> later, or only care about fences ?

For now, I'd only care about fences. We're not even sure how to do it
and you have yourself many different ways of implementing different
levels of the algorithm. I'd choose specific patterns first, but with
PRE in mind on each step of the way.

> - Is it ok to use metadata to preserve information that would otherwise be
> lost by a transformation pass (and would help a later pass) ?

I believe so. The worst that can happen is some other pass lose it in
between setting and use, but if we're conservative, we'll just avoid
optimization, not break stuff.

> - If you have read the text below, does the approach make sense ?

I have to say it was a bit dense, but I think it makes sense. I'd
encourage other people to read with care and chime in, but overall,
seems it should work. Maybe Tim Northover or David Chisnall could be
of more help here.

> For classical PRE (i.e. conservative, static), the extra property is that no
> path through the CFG goes through more instances of the computation than
> before. For speculative PRE (SPRE), it is that the cut is a min-cut, when
> all edges in the CFG are annotated with their frequency (which requires
> profiling-based information in general). The same distinction can be made in
> our case: using profiling-based information allows more aggressive
> optimization but with a risk of mis-optimization if the information is
> wrong.

I'd focus on simple PRE first and get it working. Profile information
can be added later.

> On every path going down from the Fence, find the first memory
> access/return/call, and connect the beginning of this instruction to the
> sink (with infinite weight). If instead we first reach a stronger Fence,
> then do nothing for that path (do not connect it to the sink).

So you'd end up with a sort of bipartite graph between source and sink?

> Execute the cut algorithm on this graph (only keeping the nodes/edges
> traversed above, merging all adjacent nodes that are not branches or memory
> accesses/call/return), and insert a Fence on every edge that is part of the
> cut.

I'd avoid any obscure cut algorithm, at least on the first
implementation. But this sounds correct.

> Pass Ordering
>
> I am currently planning on running this pass just after AtomicExpandPass,
> i.e. just before lowering to SelectionDAG. It could in theory be executed
> later in the backend, but I would rather avoid doing a global, control-flow
> related optimization on SelectionDAGs if at all possible. It cannot be run
> earlier, because the target-specific fences do not exist yet.

I agree. Though Chandler might be a better person to respond here.

cheers,
--renato