[clang] [llvm] AMDGPU: Add builtin/intrinsic global_(load|store)_b128 (PR #172090)

Tue Dec 23 19:45:55 PST 2025

ssahasra wrote:

@Pierre-vh:

> I think a better way to do this would be to just forward the syncscope as a
> string to the backend without checking it any further, and let the backend
> determine if that is lowerable or not. Then document the supported syncscopes.
> This is what the other builtins do.

@arsenm:

> Unfortunately we're stuck with a mixed set of atomic builtins with an enum and others with a string scope name as it is. I'm not sure if it's better to double down on one direction or the other.

I believe this is the wrong direction to take. Some of those builtins may have been introduced way back when Clang support for language-agnostic scopes was not implemented.

@shiltian:

> IMO, Clang builtins are high-level user-facing, unlike intrinsics that are not supposed to be called directly in high level languages, so we should make them robust instead of pushing the responsibility onto users or lower levels in the pipeline.

I agree wholeheartedly. The idea of taking an arbitrary string and then checking at the time of lowering might apply to an LLVM intrinsic, but does not make sense to a Clang builtin. It is reasonable for a user to expect that the frontend will catch every mistake at the source level itself.

https://clang.llvm.org/docs/LanguageExtensions.html#scoped-atomic-builtins

Clang already has support for a well-defined enum of scopes, and we should double down on that. To confirm, the Clang builtins should take a standard scope enum just like the scoped atomics, but the LLVM intrinsic can take a string literal and parse it in the same way as other scoped intrinsics.

@carlobertolli:

>Agreed. This PR provides a solution to a user request. It is not meant to be used as a design point: using existing infrastructure/syntax/etc. is deliberate and the goal is not to set a strategic direction for future more comprehensive designs.

Yes, the downstream solution was delivered in a hurry. But an upstream review is the right place to revisit some of the design decisions for the sake of long-term programmability. In this case, existing infrastructure in Clang has both variants --- quick hacks in AMDGPU builtins, and the standardized solution for Clang builtins. Clearly, we should choose the latter wherever we can.

https://github.com/llvm/llvm-project/pull/172090