[all-commits] [llvm/llvm-project] 1143da: [libc][gpu] Thread divergence fix on volta

Thu Aug 31 06:34:21 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 1143da22450a1f8de510ef75613648fb614faee7
      https://github.com/llvm/llvm-project/commit/1143da22450a1f8de510ef75613648fb614faee7
  Author: Jon Chesterfield <jonathanchesterfield at gmail.com>
  Date:   2023-08-31 (Thu, 31 Aug 2023)

  Changed paths:
    M libc/src/__support/GPU/amdgpu/utils.h
    M libc/src/__support/GPU/generic/utils.h
    M libc/src/__support/GPU/nvptx/utils.h
    M libc/src/__support/RPC/rpc.h
    M libc/test/src/__support/RPC/rpc_smoke_test.cpp

  Log Message:
  -----------
  [libc][gpu] Thread divergence fix on volta

The inbox/outbox loads are performed by the current warp, not a single thread.

The outbox load indicates whether a port has been successfully opened. If some
lanes in the warp think it has and others think the port open failed, as the
warp happened to be diverged when the load occurred, all the subsequent control
flow will be incorrect.

The inbox load indicates whether the machine on the other side of the RPC channel
has progressed. If lanes in the warp have different ideas about that, some will
try to progress their state transition while others won't. As far as the RPC layer
is concerned this is a performance problem and not a correctness one - none of the lanes
can start the transition early, only miss it and start late - but in practice the calls
layered on top of RPC do not have the interface required to detect this event and retry
the load on the stalled lanes, so the calls layered on top will be broken.

None of this is broken on amdgpu, but it's likely that the readfirstlane will have
beneficial performance properties there. Possible significant enough that it's
worth landing this ahead of fixing gpu::broadcast_value on volta.

Essentially volta wasn't adequately considered when writing this part of the protocol.
It's a bug present in the initial prototype and propagated thus far, because none of
the test cases push volta into a warp diverged state in the middle of the RPC sequence.

We should have some test cases for volta where port_open and equivalent are called
from diverged warps.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D159276