[LLVMdev] Design question about a fence elimination algorithm

Robin Morisset morisset at google.com
Fri Aug 29 16:34:29 PDT 2014


Hello everyone,

I am hitting a bit of a design problem and I would like your advice before
jumping to the implementation.

I am planning to implement a fence elimination/insertion algorithm that
would be more general than the current one (inter-basic block, at the IR
level, could be adapted to other architectures such as Power fairly
easily). I will explain the details of that algorithm later if you want but
they are not relevant to this issue. My problem is that when AtomicExpand
calls emitLeading/TrailingFence (hooks from TargetLowering), the target
greedily inserts appropriate barriers, losing information which would be
required for good fence elimination.
For example, on ARMv7 an acquire load trailing fence can either be a dmb
(what is currently emitted), or an isb fence after a control-dependent
branch. Ideally, the code would first emit and optimize the dmb barriers,
then for each acquire load see whether there is already a dmb (nothing to
do), a control-dependent branch (emit an isb) or something else (emit a
dmb). But once a load acquire has been lowered to a dmb it is too late,
that information is lost. There are other similar cases where useful
information can be lost (for example the leading fence of a release store
is a dmb that can commute with other accesses to the same location.. but
not with accesses to other locations. Once the dmb is emitted, we forget
which access spawned it).

I see mostly two different approaches for fixing this, and don't know which
to prefer (and maybe you can see a third way of doing it ?).

1) The first idea is to have an IR-level pass per backend doing the
elimination for each architecture (they can share code through some
target-independent helper functions), and to preserve the information
through some special pseudo-instruction (a new target-dependent intrinsic ?
a normal dmb/sync but with special metadata ?). So for example there would
be DMBAfterAcquireLoad, which would take the load as an operand and could
be turned into a control dependency + isb later. In case where fence
elimination is disabled, there would be a basic pass to lower those in the
trivial way.
Drawback: will require adding a bunch of pseudo-instructions to the IR.
Even though they probably won't be seen by any other pass, it could be seen
as a source of complexity.

2) Another possibility is to have the ExpandAtomicPass keep a vector of the
places where it currently calls emitLeading/TrailingFence, and give it at
the end of runOnFunction to another target hook. The target could then do
the fence elimination/insertion, having all the information present. Code
could be shared between the targets in the same way as above through helper
functions.
Drawback: I suspect it might be a bit involved to keep a vector of
insertion points in a function while mutating the control-flow of the
function. Is there any simple way of doing that without a risk of
invalidation ?

Thank you,
Robin Morisset
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140829/9f12ba98/attachment.html>


More information about the llvm-dev mailing list