[PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Scott Linder via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 21 15:28:56 PDT 2020
scott.linder requested changes to this revision.
scott.linder added inline comments.
This revision now requires changes to proceed.
================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1484
+
+ case 4: // 0 + 4
+ FourByteBuffer = DE.getU32(Cursor);
----------------
scott.linder wrote:
> Rather than have these comments, which are still just filled with magic numbers, could we define these offsets more explicitly somewhere, as e.g. `amdhsa::GROUP_SEGMENT_FIXED_SIZE_OFFSET`? For example in `llvm/include/llvm/Support/AMDHSAKernelDescriptor.h` alongside the other definitions needed by the compiler? We could then also update the `static_assert`s there to use those definitions, so we aren't relying on inspection to know the same offsets are used everywhere. I.e. the following:
>
> ```
> 165 static_assert(
> 1 sizeof(kernel_descriptor_t) == 64,
> 2 "invalid size for kernel_descriptor_t");
> 3 static_assert(
> 4 offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
> 5 "invalid offset for group_segment_fixed_size");
> 6 static_assert(
> 7 offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
> 8 "invalid offset for private_segment_fixed_size");
> 9 static_assert(
> 10 offsetof(kernel_descriptor_t, reserved0) == 8,
> 11 "invalid offset for reserved0");
> ...
> ```
>
> Becomes:
>
> ```
> 165 static_assert(
> 1 sizeof(kernel_descriptor_t) == 64,
> 2 "invalid size for kernel_descriptor_t");
> 3 static_assert(
> 4 offsetof(kernel_descriptor_t, group_segment_fixed_size) == GROUP_SEGMENT_FIXED_SIZE_OFFSET,
> 5 "invalid offset for group_segment_fixed_size");
> 6 static_assert(
> 7 offsetof(kernel_descriptor_t, private_segment_fixed_size) == PRIVATE_SEGMENT_FIXED_SIZE_OFFSET,
> 8 "invalid offset for private_segment_fixed_size");
> 9 static_assert(
> 10 offsetof(kernel_descriptor_t, reserved0) == RESERVED0_OFFSET,
> 11 "invalid offset for reserved0");
> ...
> ```
I would still like to see this done, the magic numbers here could lead to a few problems down the line, and presently they just make the code harder to read.
================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1243-1250
+ uint32_t NextFreeVGPR = 0;
+ if (isGFX9()) {
+ NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * 4;
+ } else if (isGFX10()) {
+ bool IsWave64 = STI.getFeatureBits()[AMDGPU::FeatureWavefrontSize64];
+ uint32_t Tmp = GranulatedWorkitemVGPRCount + 1;
+ NextFreeVGPR = IsWave64 ? Tmp * 4 : Tmp * 8;
----------------
I think this can just be replaced with:
```
NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI);
```
Or we could add another function called `getNumVGPRs(const MCSubtargetInfo *STI, unsigned NumVGPRBlocks, Optional<bool> EnableWavefrontSize32)` and put the definition directly next to `getNumVGPRBlocks(const MCSubtargetInfo *STI, unsigned NumVGPRs, Optional<bool> EnableWavefrontSize32)` so any future changes affecting one also affect the other. I would lean towards this, and documenting that they are the inverse of one another.
================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1278-1285
+ if (isGFX9()) {
+ NextFreeSGPR = ((GranulatedWavefrontSGPRCount / 2) + 1) * 16;
+ } else if (isGFX10() && GranulatedWavefrontSGPRCount) {
+ return MCDisassembler::Fail;
+ } else {
+ // GFX 6-8
+ NextFreeSGPR = (GranulatedWavefrontSGPRCount + 1) * 8;
----------------
Same as above, I think a new `getNumSGPRs` to complement `getNumSGPRBlocks` would make this easier to read.
For the GFX10 case we could either leave the check here, or have the new function return `Optional` to indicate when there is an error.
================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1290
+ KdStream << Indent << ".amdhsa_reserve_xnack_mask " << 0 << '\n';
+ // KdStream << "GS = " << GranulatedWavefrontSGPRCount << "\n";
+ KdStream << Indent << ".amdhsa_next_free_sgpr " << NextFreeSGPR << "\n";
----------------
What is "GS" and why is this commented out?
================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1572-1576
+ if (decodeKernelDescriptor(Name.drop_back(3), Bytes, Size, Address) ==
+ MCDisassembler::Success)
+ return MCDisassembler::Success;
+
+ return MCDisassembler::Fail;
----------------
Can this just be `return decodeKernelDescriptor(...);`?
================
Comment at: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s:6
+
+.amdhsa_kernel my_kernel
+.amdhsa_next_free_vgpr 2
----------------
Can you implement more tests? I don't know if it is feasible to include them all in the same .s file, as you have to work around the output not being re-assembleable in general, but even just copy-pasting and editing is fine with me. Just having some more test cases to cover at least a reasonable sample of the different branches, failure modes, etc.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80713/new/
https://reviews.llvm.org/D80713
More information about the llvm-commits
mailing list