[PATCH] D88291: [AMDGPU] Insert waterfall loops for divergent calls

Tue Sep 29 02:48:28 PDT 2020

Flakebi added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:4897
+  MI.dump();
+  Begin->dump();
+  MachineBasicBlock::iterator I(&MI);
----------------
madhur13490 wrote:
> Dump() is not required.
Thanks, I forgot to remove them.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:5127
+    MachineOperand *Dest = &MI.getOperand(0);
+    if (!RI.isSGPRClass(MRI.getRegClass(Dest->getReg()))) {
+      // Also move the copies to physical registers into the loop block
----------------
madhur13490 wrote:
> Should this block be executed for AGPRs too? If this is meant only for VGPRs then !SGPR is not correct.
I don’t know how AGPRs work and documentation seems to be scarce, the check for calls is the same as for image operations above.
It seems like AGPRs can be copied to VGPRs but not to SGPRs (https://reviews.llvm.org/D68358#change-cnuV0NuhUhzL), so I think calling a function pointer that is stored in AGPRs should copy them to VGPRs and insert a waterfall loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88291/new/

https://reviews.llvm.org/D88291