[PATCH] D35146: AMDGPU : Widen extending scalar loads to 32-bits
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 21 09:55:16 PDT 2017
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:449
+bool AMDGPUCodeGenPrepare::visitLoadInst(LoadInst &I) {
+ Type * Ty = I.getType();
+ VectorType *VT = dyn_cast<VectorType>(Ty);
----------------
No space after *
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:451
+ VectorType *VT = dyn_cast<VectorType>(Ty);
+ const DataLayout &DL = I.getModule()->getDataLayout();
+ int TySize = DL.getTypeSizeInBits(Ty);
----------------
M->getDataLayout()
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:462-475
+ Value *WidenLoad = Builder.CreateLoad(BitCast);;
+ if (VT) {
+ Type *IntNTy = Builder.getIntNTy(TySize);
+ Value *ValTrunc = Builder.CreateTrunc(WidenLoad, IntNTy);
+ Value *ValOrig = Builder.CreateBitCast(ValTrunc, I.getType());
+ I.replaceAllUsesWith(ValOrig);
+ I.eraseFromParent();
----------------
You don't need an entire block of code that is mostly the same for the vector case. The non-vector case requires a bitcast as well.
================
Comment at: test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll:3
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s | FileCheck -check-prefix=GCN-HSA -check-prefix=FUNC %s
+; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN-NOHSA,FUNC %s
+
----------------
These should be running just this pass with opt
================
Comment at: test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll:12
+define amdgpu_kernel void @constant_load_i8(i8 addrspace(1)* %out, i8 addrspace(2)* %in) #0 {
+ %ld = load i8, i8 addrspace(2)* %in
+ store i8 %ld, i8 addrspace(1)* %out
----------------
This should not be converted because you don't know if it's 4 byte aligned
================
Comment at: test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll:48
+}
+
+attributes #0 = { nounwind }
----------------
Needs tests with half and <2 x half>, i1, and maybe another exotic size. I'm pretty sure this will assert for half as is now. Also need tests with various alignments and volatile
Repository:
rL LLVM
https://reviews.llvm.org/D35146
More information about the llvm-commits
mailing list