[PATCH] D35146: AMDGPU : Widen extending scalar loads to 32-bits

Fri Jul 21 09:55:16 PDT 2017

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:449
+bool AMDGPUCodeGenPrepare::visitLoadInst(LoadInst &I) {
+  Type * Ty = I.getType();
+  VectorType *VT = dyn_cast<VectorType>(Ty);
----------------
No space after *

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:451
+  VectorType *VT = dyn_cast<VectorType>(Ty);
+  const DataLayout &DL = I.getModule()->getDataLayout();
+  int TySize = DL.getTypeSizeInBits(Ty);
----------------
M->getDataLayout()

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:462-475
+    Value *WidenLoad = Builder.CreateLoad(BitCast);;
+    if (VT) {
+      Type *IntNTy = Builder.getIntNTy(TySize);
+      Value *ValTrunc = Builder.CreateTrunc(WidenLoad, IntNTy);
+      Value *ValOrig = Builder.CreateBitCast(ValTrunc, I.getType());
+      I.replaceAllUsesWith(ValOrig);
+      I.eraseFromParent();
----------------
You don't need an entire block of code that is mostly the same for the vector case. The non-vector case requires a bitcast as well.

================
Comment at: test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll:3
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s | FileCheck -check-prefix=GCN-HSA -check-prefix=FUNC %s
+; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN-NOHSA,FUNC %s
+
----------------
These should be running just this pass with opt

================
Comment at: test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll:12
+define amdgpu_kernel void @constant_load_i8(i8 addrspace(1)* %out, i8 addrspace(2)* %in) #0 {
+  %ld = load i8, i8 addrspace(2)* %in
+  store i8 %ld, i8 addrspace(1)* %out
----------------
This should not be converted because you don't know if it's 4 byte aligned

================
Comment at: test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll:48
+}
+
+attributes #0 = { nounwind }
----------------
Needs tests with half and <2 x half>, i1, and maybe another exotic size. I'm pretty sure this will assert for half  as is now. Also need tests with various alignments and volatile

Repository:
  rL LLVM

https://reviews.llvm.org/D35146