[llvm-dev] _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

Mon Aug 10 12:25:46 PDT 2020

Thanks Nicolai. I'll try to take a look at the review.

The user is the one calling _mm_lfence on a particular path. Would we need
some IR transform to turn it into the IR you showed if it is used on two
paths?

~Craig

On Sun, Aug 9, 2020 at 8:15 AM Nicolai Hähnle <nhaehnle at gmail.com> wrote:

> Hi Craig,
>
> The review for the similar GPU problem is now up here:
> https://reviews.llvm.org/D85603 (+ some other patches on the
> Phabricator stack).
>
> From a pragmatic perspective, the constraints added to program
> transforms there are sufficient for what you need. You'd produce IR
> such as:
>
>     %token = call token @llvm.experimental.convergence.anchor()
>     br i1 %c, label %then, label %else
>
>   then:
>     call void @llvm.x86.sse2.lfence() convergent [
> "convergencectrl"(token%token) ]
>      ...
>
>   else:
>     call void @llvm.x86.sse2.lfence() convergent [
> "convergencectrl"(token %token) ]
>     ...
>
> ... and this would prevent the hoisting of the lfences.
>
> The puzzle to me is whether one can justify this use of the
> convergence tokens from a theoretical point of view. We describe
> convergence control in terms of threads that communicate, which is a
> faithful description of what's happening in the GPU use case. I wonder
> whether for the speculative execution problem, one could justify the
> use of the same convergence control machinery by arguing about the
> existence of "potential speculative threads of execution" and
> communication between them. Basically, the argument would be somewhere
> along the lines that the lfence can only proceed execution once all
> speculative threads of execution that it _cannot_ communicate with
> according to the convergence token are killed off. I suspect that
> somebody would have to go off and do some deep thinking for a while to
> figure out whether that really makes sense.
>
> Cheers,
> Nicolai
>
> On Wed, Jul 29, 2020 at 11:14 AM Nicolai Hähnle <nhaehnle at gmail.com>
> wrote:
> >
> > Hi Craig,
> >
> > that's an interesting problem.
> >
> > We have a superficially similar problem in GPU programming models
> > where there are cross-thread communication operations that are
> > sensitive to control flow, as in:
> >
> >   if (c) {
> >     b = subgroupAdd(a);
> >     bar(b);
> >   } else {
> >     b = subgroupAdd(a);
> >     baz(b);
> >   }
> >
> > LLVM will merge those, even though it changes the behavior
> > (potentially summing over a larger set of threads than in the original
> > program). Merging them is inherently correct for LLVM's semantics.
> > It's the same underlying problem as what you describe: LLVM IR simply
> > doesn't have a way of describing these semantics that fall somewhat
> > outside of a purely deterministic single-threaded execution model. For
> > our needs, we're currently working around this by essentially adding a
> > unique ID to each of these operations so that they all appear
> > different to LLVM. I suspect that the same could work for you.
> >
> > Still, it's a bit of an awkward workaround and a better solution would
> > be great. I've been wondering whether we could perhaps have token
> > values produced by branch instructions to express certain kinds of
> > dependencies. In your case, you'd end up with something like:
> >
> >     %token = br i1 %c, label %then, label %else
> >
> >   then:
> >     call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ]
> >      ...
> >
> >   else:
> >     call void @llvm.x86.sse2.lfence() [ "some-bundle"(%token) ]
> >      ...
> >
> > The token indicates an essential control dependency on the branch
> > instruction. I've previously rejected this idea as too invasive, and
> > there are alternatives for our particular use case, but if there are
> > multiple use cases for this kind of dependency -- and it kind of looks
> > like it from where I stand -- then perhaps this is something to
> > consider more seriously?
> >
> > Cheers,
> > Nicolai
> >
> > On Wed, Jul 29, 2020 at 1:30 AM Craig Topper via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > >
> > > _mm_lfence was originally documented as a load fence. But in light of
> speculative execution vulnerabilities it has started being advertised as a
> way to prevent speculative execution. Current Intel Software Development
> Manual documents it as "Specifically, LFENCE does not execute until all
> prior instructions have completed locally, and no later instruction begins
> execution until LFENCE completes".
> > >
> > > For the following test, my intention was to ensure that the body of
> either the if or the else would not proceed until any speculation of the
> branch had resolved. But SimplifyCFG saw that both control paths started
> with an lfence so hoisted it into a single lfence intrinsic before the
> branch. https://godbolt.org/z/qMc446    The intrinsic in IR has no
> properties so it should be assumed to read/write any memory. But that's not
> enough to specify this control flow dependency. gcc also exhibits a similar
> behavior.
> > >
> > > #include <x86intrin.h>
> > >
> > > void bar();
> > > void baz();
> > >
> > > void foo(int c) {
> > >   if (c) {
> > >       _mm_lfence();
> > >       bar();
> > >   } else {
> > >       _mm_lfence();
> > >       baz();
> > >   }
> > > }
> > >
> > >
> > > Alternatively, I also tried replacing the intrinsics with inline
> assembly. SimplifyCFG still merged those. But gcc did not.
> https://godbolt.org/z/acnPxY
> > >
> > > void bar();
> > > void baz();
> > >
> > > void foo(int c) {
> > >   if (c) {
> > >       __asm__ __volatile ("lfence");
> > >       bar();
> > >   } else {
> > >       __asm__ __volatile ("lfence");
> > >       baz();
> > >   }
> > > }
> > >
> > > I believe the [[clang::nomerge]] attribute was recently extended to
> inline assembly which can be used to prevent the inline assembly from being
> hoisted by SimplifyCFG https://reviews.llvm.org/D84225    It also appears
> to work for intrinsic version, but I think its limited to C++ only.
> > >
> > > Is there some existing property we can put on the intrinsic to prevent
> SimplifyCFG from hoisting like this? Are we more aggressive than we should
> be about hoisting inline assembly?
> > >
> > > Thanks,
> > > ~Craig
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> > --
> > Lerne, wie die Welt wirklich ist,
> > aber vergiss niemals, wie sie sein sollte.
>
>
>
> --
> Lerne, wie die Welt wirklich ist,
> aber vergiss niemals, wie sie sein sollte.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200810/d9ba2a8f/attachment.html>