[PATCH] D80713: [AMDGPU] Support disassembly for AMDGPU kernel descriptors

Tue Jul 21 15:28:56 PDT 2020

scott.linder requested changes to this revision.
scott.linder added inline comments.
This revision now requires changes to proceed.

================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1484
+
+  case 4: // 0 + 4
+    FourByteBuffer = DE.getU32(Cursor);
----------------
scott.linder wrote:
> Rather than have these comments, which are still just filled with magic numbers, could we define these offsets more explicitly somewhere, as e.g. `amdhsa::GROUP_SEGMENT_FIXED_SIZE_OFFSET`? For example in `llvm/include/llvm/Support/AMDHSAKernelDescriptor.h` alongside the other definitions needed by the compiler? We could then also update the `static_assert`s there to use those definitions, so we aren't relying on inspection to know the same offsets are used everywhere. I.e. the following:
> 
> ```
> 165 static_assert(
>   1     sizeof(kernel_descriptor_t) == 64,
>   2     "invalid size for kernel_descriptor_t");
>   3 static_assert(
>   4     offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
>   5     "invalid offset for group_segment_fixed_size");
>   6 static_assert(
>   7     offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
>   8     "invalid offset for private_segment_fixed_size");
>   9 static_assert(
>  10     offsetof(kernel_descriptor_t, reserved0) == 8,
>  11     "invalid offset for reserved0");
> ...
> ```
> 
> Becomes:
> 
> ```
> 165 static_assert(
>   1     sizeof(kernel_descriptor_t) == 64,
>   2     "invalid size for kernel_descriptor_t");
>   3 static_assert(
>   4     offsetof(kernel_descriptor_t, group_segment_fixed_size) == GROUP_SEGMENT_FIXED_SIZE_OFFSET,
>   5     "invalid offset for group_segment_fixed_size");
>   6 static_assert(
>   7     offsetof(kernel_descriptor_t, private_segment_fixed_size) == PRIVATE_SEGMENT_FIXED_SIZE_OFFSET,
>   8     "invalid offset for private_segment_fixed_size");
>   9 static_assert(
>  10     offsetof(kernel_descriptor_t, reserved0) == RESERVED0_OFFSET,
>  11     "invalid offset for reserved0");
> ...
> ```
I would still like to see this done, the magic numbers here could lead to a few problems down the line, and presently they just make the code harder to read.

================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1243-1250
+  uint32_t NextFreeVGPR = 0;
+  if (isGFX9()) {
+    NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * 4;
+  } else if (isGFX10()) {
+    bool IsWave64 = STI.getFeatureBits()[AMDGPU::FeatureWavefrontSize64];
+    uint32_t Tmp = GranulatedWorkitemVGPRCount + 1;
+    NextFreeVGPR = IsWave64 ? Tmp * 4 : Tmp * 8;
----------------
I think this can just be replaced with:

```
NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI);
```

Or we could add another function called `getNumVGPRs(const MCSubtargetInfo *STI, unsigned NumVGPRBlocks, Optional<bool> EnableWavefrontSize32)` and put the definition directly next to `getNumVGPRBlocks(const MCSubtargetInfo *STI, unsigned NumVGPRs, Optional<bool> EnableWavefrontSize32)` so any future changes affecting one also affect the other. I would lean towards this, and documenting that they are the inverse of one another.

================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1278-1285
+  if (isGFX9()) {
+    NextFreeSGPR = ((GranulatedWavefrontSGPRCount / 2) + 1) * 16;
+  } else if (isGFX10() && GranulatedWavefrontSGPRCount) {
+    return MCDisassembler::Fail;
+  } else {
+    // GFX 6-8
+    NextFreeSGPR = (GranulatedWavefrontSGPRCount + 1) * 8;
----------------
Same as above, I think a new `getNumSGPRs` to complement `getNumSGPRBlocks` would make this easier to read.

For the GFX10 case we could either leave the check here, or have the new function return `Optional` to indicate when there is an error.

================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1290
+  KdStream << Indent << ".amdhsa_reserve_xnack_mask " << 0 << '\n';
+  // KdStream << "GS = " << GranulatedWavefrontSGPRCount << "\n";
+  KdStream << Indent << ".amdhsa_next_free_sgpr " << NextFreeSGPR << "\n";
----------------
What is "GS" and why is this commented out?

================
Comment at: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1572-1576
+    if (decodeKernelDescriptor(Name.drop_back(3), Bytes, Size, Address) ==
+        MCDisassembler::Success)
+      return MCDisassembler::Success;
+
+    return MCDisassembler::Fail;
----------------
Can this just be `return decodeKernelDescriptor(...);`?

================
Comment at: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s:6
+
+.amdhsa_kernel my_kernel
+.amdhsa_next_free_vgpr 2
----------------
Can you implement more tests? I don't know if it is feasible to include them all in the same .s file, as you have to work around the output not being re-assembleable in general, but even just copy-pasting and editing is fine with me. Just having some more test cases to cover at least a reasonable sample of the different branches, failure modes, etc.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80713/new/

https://reviews.llvm.org/D80713