[PATCH] D94648: [amdgpu] Implement lower function LDS pass
Mahesha S via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 19 18:48:11 PDT 2021
hsmhsm added a comment.
In D94648#2770088 <https://reviews.llvm.org/D94648#2770088>, @hsmhsm wrote:
> In D94648#2769780 <https://reviews.llvm.org/D94648#2769780>, @JonChesterfield wrote:
>
>> In D94648#2758386 <https://reviews.llvm.org/D94648#2758386>, @arsenm wrote:
>>
>>> In D94648#2757793 <https://reviews.llvm.org/D94648#2757793>, @JonChesterfield wrote:
>>>
>>>>
>>>
>>>
>>>
>>>> edit: lowerFormalArguments is not called if there are no formal arguments to the kernel. Test case I started from does pass arguments to the kernel, but they were unused and eliminated.
>>>
>>> Not seeing how it isn't called with no arguments? It should still be called anyway
>>
>> I expected that too, but debug statements around allocateLDSGlobal didn't fire, and looking at the control flow around the sdag and globalisel paths lowerFormalArguments is called from within a loop `for (const Argument &Arg : F.args()) {}`. I may be missing something of course (didn't put a lot of time into chasing this), but it definitely looks like lowerFormalArguments doesn't get called when there are no arguments.
>
> Hi Jon,
>
> For the 0 argument kernel, did you put debug statement within SITargetLowering::LowerFormalArguments() and test whether it will hit or not? My experimentation shows that it is indeed hit for 0 arg kernels too. So it is not problem with 0 arg kernel.
>
> Problem is within the function - AMDGPUMachineFunction::allocateModuleLDSGlobal() that you wrote.
>
> The statement
>
> GlobalVariable *GV = M->getGlobalVariable("llvm.amdgcn.module.lds");
>
> is suppose to be replaced by
>
> GlobalVariable *GV = M->getGlobalVariable("llvm.amdgcn.module.lds", true);
>
> My understanding is - For Module to search for a internal global variable successfully within the symbol table, we need to explicitly tell it to do so by passing "true" as an additional arg to getGlobalVariable(). Otherwise, the internal symbol won't be find.
Further, perhaps, you also need to add one lit test like below, call llc and make sure that @llvm.amdgcn.module.lds is allocated at address 0.
@llvm.amdgcn.module.lds = internal unnamed_addr addrspace(3) global [16 x i8] undef, align 16
define amdgpu_kernel void @kern() {
%llvm.amdgcn.module.lds.bc = bitcast [16 x i8] addrspace(3)* @llvm.amdgcn.module.lds to i8 addrspace(3)*
store i8 6, i8 addrspace(3)* %llvm.amdgcn.module.lds.bc, align 16
ret void
}
CMD: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 --amdhsa-code-object-version=4 lds-allocation2.ll -o tmp.s
assembly generated:
v_mov_b32_e32 v0, 0
v_mov_b32_e32 v1, 6
ds_write_b8 v0, v1
s_endpgm
We should probably CHECK for the instruction pattern - ```v_mov_b32_e32 v0, 0```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94648/new/
https://reviews.llvm.org/D94648
More information about the llvm-commits
mailing list