[PATCH] D76567: AMDGPU: Implement getMemcpyLoopLoweringType

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 24 02:07:53 PDT 2020


foad added a comment.

I still don't understand the logic for when to use 2-byte accesses. Is it something like: use 1, 4, 8 and 16-byte accesses unconditionally, but 2-byte accesses only when we know source and destination are at least 2-byte aligned? Why is the implementation of this different depending on whether the //length// is a known constant or not?



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:344
+
+    // Global memory works with 16-bit accesses. Private memory will also hit
+    // this, although they'll be decomposed.
----------------
"16-byte"?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:359
+
+  if (Size >= 2 && SrcAlign == 2 && DstAlign == 2)
+    return Type::getInt16Ty(Context);
----------------
Should those `==` be `>=`?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:384
+
+  if (SrcAlign > 2 && DestAlign > 2) {
+    Type *I16Ty = Type::getInt16Ty(Context);
----------------
Should those `>` be `>=`?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76567/new/

https://reviews.llvm.org/D76567





More information about the llvm-commits mailing list