[PATCH] D92483: AMDGPU - Use MUBUF instructions for global address space access
Tony Tye via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 23 22:51:30 PST 2020
t-tye added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:92
if (isAmdHsaOS()) // Turn on FlatForGlobal for HSA.
FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,";
----------------
Suggest:
// Turn on features that HSA ABI requires. Also turn on FlatForGlobal by
// default.
if (isAmdHsaOS())
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:97-105
// Disable mutually exclusive bits.
if (FS.find_lower("+wavefrontsize") != StringRef::npos) {
if (FS.find_lower("wavefrontsize16") == StringRef::npos)
FullFS += "-wavefrontsize16,";
if (FS.find_lower("wavefrontsize32") == StringRef::npos)
FullFS += "-wavefrontsize32,";
if (FS.find_lower("wavefrontsize64") == StringRef::npos)
----------------
I do not understand how this is supposed to work. The code does not seem to match the comment. The parser seems to convert all these features into a single WavefrontSizeLog2 so last + one would win. So why is this even needed?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:111-113
+ // Implement the "generic" processors, which acts as the default when no
+ // generation features are enabled (e.g for -mcpu=''). There are two variants
+ // one for hsa and one for everything else.
----------------
Suggest:
// Implement the "generic" processors, which acts as the default when no
// generation features are enabled (e.g for -mcpu=''). HSA OS defaults to the
// first amdgcn target that supports flat addressing. Other OSes default to
// the first amdgcn target.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:122-132
// Unless +-flat-for-global is specified, turn on FlatForGlobal for all OS-es
// on VI and newer hardware to avoid assertion failures due to missing ADDR64
// variants of MUBUF instructions.
if (!hasAddr64() && !FS.contains("flat-for-global")) {
FlatForGlobal = true;
}
----------------
Suggest:
// Targets must either support 64-bit offsets for MUBUF instructions, and/or
// support flat operations, otherwise they cannot access a 64-bit global
// address space.
assert(hasAddr64() || hasFlat());
// Unless +-flat-for-global is specified, turn on FlatForGlobal for targets
// that do not support ADDR64 variants of MUBUF instructions. Such targets
// cannot use a 64 bit offset with a MUBUF instruction to access the global
// address space.
if (!hasAddr64() && !FS.contains("flat-for-global") && !FlatForGlobal) {
ToggleFeature(AMDGPU::FeatureFlatForGlobal);
FlatForGlobal = true;
}
// Unless +-flat-for-global is specified, use MUBUF instructions for global
// address space access if flat operations are not available.
if (!hasFlat() && !FS.contains("flat-for-global") && FlatForGlobal) {
ToggleFeature(AMDGPU::FeatureFlatForGlobal);
FlatForGlobal = false;
}
Also add the following to llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h since that is the way other similar predicates are defined:
bool hasFlat() const {
return (getGeneration() > AMDGPUSubtarget::SOUTHERN_ISLANDS);
}
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D92483/new/
https://reviews.llvm.org/D92483
More information about the llvm-commits
mailing list