[PATCH] D76567: AMDGPU: Implement getMemcpyLoopLoweringType
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 24 02:07:53 PDT 2020
foad added a comment.
I still don't understand the logic for when to use 2-byte accesses. Is it something like: use 1, 4, 8 and 16-byte accesses unconditionally, but 2-byte accesses only when we know source and destination are at least 2-byte aligned? Why is the implementation of this different depending on whether the //length// is a known constant or not?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:344
+
+ // Global memory works with 16-bit accesses. Private memory will also hit
+ // this, although they'll be decomposed.
----------------
"16-byte"?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:359
+
+ if (Size >= 2 && SrcAlign == 2 && DstAlign == 2)
+ return Type::getInt16Ty(Context);
----------------
Should those `==` be `>=`?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:384
+
+ if (SrcAlign > 2 && DestAlign > 2) {
+ Type *I16Ty = Type::getInt16Ty(Context);
----------------
Should those `>` be `>=`?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D76567/new/
https://reviews.llvm.org/D76567
More information about the llvm-commits
mailing list