[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

Fri Mar 22 08:17:04 PDT 2024

changpeng wrote:

[AMD Official Use Only - General]

I am fine to remove f16/bf16 versions. Enumerating all possible types could be very painful. For example we gave up enumerating for B64, and ended up using v2i32 only. What do others think removing f16/bf16 versions? Thanks

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Matt Arsenault ***@***.***>
Sent: Friday, March 22, 2024 3:45:53 AM
To: llvm/llvm-project ***@***.***>
Cc: Fang, Changpeng ***@***.***>; Author ***@***.***>
Subject: Re: [llvm/llvm-project] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

@arsenm commented on this pull request.

________________________________

In clang/include/clang/Basic/BuiltinsAMDGPU.def<https://github.com/llvm/llvm-project/pull/86202#discussion_r1535389287>:

> -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", "gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", "gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", "gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", "gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", "gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", "gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", "gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", "gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", "gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", "gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", "gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", "gfx12-insts,wavefrontsize64")

Do we really need the f16/bf16 versions? You can always bitcast the i16 versions.

—
Reply to this email directly, view it on GitHub<https://github.com/llvm/llvm-project/pull/86202#pullrequestreview-1954492883>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABALCLLYCW6QUD3CELVLSKDYZQDWDAVCNFSM6AAAAABFCIDLAKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNJUGQ4TEOBYGM>.
You are receiving this because you authored the thread.Message ID: ***@***.***>

https://github.com/llvm/llvm-project/pull/86202