[PATCH] D37205: AMDGPU: Make worst-case assumption about the wait states in inline assembly

Wed Sep 6 06:46:47 PDT 2017

nhaehnle added a comment.

In https://reviews.llvm.org/D37205#855583, @arsenm wrote:

> LGTM.

Thanks.

> What is the barrier for? That sounds disturbing

I agree. It's a hack for the "ballot" instruction, which takes a per-thread boolean value and returns a bit-mask of that value for each live thread. The Mesa frontend currently translates this to a llvm.amdgcn.icmp.i32 (though it'd be nice to have a dedicated intrinsic, as that would allow better codegen; hasn't been high priority). The issue is code like:

  if (if_cond) {
     threadmask_true = ballot(ballot_cond)
    do something...
  } else {
     threadmask_false = ballot(ballot_cond)
     do something...
  }

... which LLVM will happily hoist to

  threadmask_all = ballot(ballot_cond)
  if (if_cond) {
     do something
  } else {
     do something
  }

which is incorrect, because the following hold:

1. `threadmask_true & threadmask_false == 0`
2. `threadmask_all = threadmask_true | threadmask_false`

In order to convince LLVM not to do that, we're passing the input argument of llvm.amdgcn.icmp.i32 through a no-op inline assembly statement which is said to have side-effects and which pretends to be different between the two branches.

It's ugly all around and we should really fix it, but adding the corresponding semantics to LLVM IR runs against the wall of people who work on compilers for normal machines...

https://reviews.llvm.org/D37205