[llvm-dev] Memory barrier problem

Kaylor, Andrew via llvm-dev llvm-dev at lists.llvm.org
Wed Jan 27 09:50:07 PST 2021


Hi everyone,

I have a problem with multi-threaded memory synchronization that I'd like to get some input on.

Consider the following IR:

------------

define void @bar() convergent {
  fence acq_rel
  ret void
}

define i32 @foo(i32* noalias %p, i32 %flag) {
entry:
  store i32 0, i32* %p
  call void @bar()
  %cmp = icmp eq i32 %flag, 0
  br i1 %cmp, label %if.then, label %if.end

if.then:
  store i32 1, i32* %p
  br label %if.end

if.end:
  call void @bar()
  %x = load i32, i32* %p
  ret i32 %x
}

------------

I have an argument (%p) which is marked with the 'noalias' attribute. The memory pointed to by this argument is read, written, and read again within the function. Between these accesses, I am calling a function that contains a fence instruction. If that call with the fence is not inlined, GVN will eliminate the second load.

------------

define i32 @foo(i32* noalias %p, i32 %flag) {
entry:
  store i32 0, i32* %p, align 4
  call void @bar()
  %cmp = icmp eq i32 %flag, 0
  br i1 %cmp, label %if.then, label %if.end

if.then:
  store i32 1, i32* %p, align 4
  br label %if.end

if.end:
  %x = phi i32 [ 1, %if.then ], [ 0, %entry ]   ; <============== Incorrect
  call void @bar()
  ret i32 %x
}

------------

https://godbolt.org/z/14o8oY

This is a reduction of a scenario I've come across in a SYCL program. The bar() function corresponds to a work group barrier that is meant to have the memory synchronizing effect described by the fence instruction in my example. I'm trying to figure out how to construct LLVM IR that will represent the semantics I need.

If I remove the 'noalias' attribute from the argument, GVN won't make this optimization because it conservatively assumes that the memory might be modified within the called function. That's fine, but I think it fixes the problem for the wrong reason. In fact, the memory location is not modified in the called function and as I understand it the 'noalias' attribute only guarantees that the memory won't be accessed *in the current thread* using pointers that aren't based on the 'noalias' pointer. So, the fact that it might be modified by another thread shouldn't invalidate the 'noalias' attribute. Is that correct?

I can also block the GVN optimization by putting the fence instruction directly in the foo() function, such as by inlining the call to bar(). But, of course, the semantics of the IR should not depend on whether or not I've inlined functions. In this case the inlining is trivial, but the problem potentially exists for a called function that uses a barrier in a way that is not so immediately visible.

I put the 'convergent' attribute on my bar() function mostly to demonstrate that this doesn't solve the problem. As I understand it, the 'convergent' attribute describes control flow constraints and says nothing about memory access synchronization. Is that correct?


Is there a way to handle this case? I have some ideas, but I'd like to start by just posing the question to see if there are better avenues available than I've considered.

Thanks,
Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210127/a1f8f113/attachment.html>


More information about the llvm-dev mailing list