<div dir="ltr">Hello everyone,<div><br></div><div>I am hitting a bit of a design problem and I would like your advice before jumping to the implementation.</div><div><br></div><div>I am planning to implement a fence elimination/insertion algorithm that would be more general than the current one (inter-basic block, at the IR level, could be adapted to other architectures such as Power fairly easily). I will explain the details of that algorithm later if you want but they are not relevant to this issue. My problem is that when AtomicExpand calls emitLeading/TrailingFence (hooks from TargetLowering), the target greedily inserts appropriate barriers, losing information which would be required for good fence elimination.</div>

<div>For example, on ARMv7 an acquire load trailing fence can either be a dmb (what is currently emitted), or an isb fence after a control-dependent branch. Ideally, the code would first emit and optimize the dmb barriers, then for each acquire load see whether there is already a dmb (nothing to do), a control-dependent branch (emit an isb) or something else (emit a dmb). But once a load acquire has been lowered to a dmb it is too late, that information is lost. There are other similar cases where useful information can be lost (for example the leading fence of a release store is a dmb that can commute with other accesses to the same location.. but not with accesses to other locations. Once the dmb is emitted, we forget which access spawned it).</div>

<div><br></div><div>I see mostly two different approaches for fixing this, and don't know which to prefer (and maybe you can see a third way of doing it ?).</div><div><br></div><div>1) The first idea is to have an IR-level pass per backend doing the elimination for each architecture (they can share code through some target-independent helper functions), and to preserve the information through some special pseudo-instruction (a new target-dependent intrinsic ? a normal dmb/sync but with special metadata ?). So for example there would be DMBAfterAcquireLoad, which would take the load as an operand and could be turned into a control dependency + isb later. In case where fence elimination is disabled, there would be a basic pass to lower those in the trivial way.</div>

<div>Drawback: will require adding a bunch of pseudo-instructions to the IR. Even though they probably won't be seen by any other pass, it could be seen as a source of complexity.</div><div><br></div><div>2) Another possibility is to have the ExpandAtomicPass keep a vector of the places where it currently calls emitLeading/TrailingFence, and give it at the end of runOnFunction to another target hook. The target could then do the fence elimination/insertion, having all the information present. Code could be shared between the targets in the same way as above through helper functions.</div>

<div>Drawback: I suspect it might be a bit involved to keep a vector of insertion points in a function while mutating the control-flow of the function. Is there any simple way of doing that without a risk of invalidation ?</div>

<div><br></div><div>Thank you,</div><div>Robin Morisset</div></div>