[PATCH] D128252: [AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.

Wed Jul 13 10:46:07 PDT 2022

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp:946
+  // Next unique ID to use while new instance created.
+  static unsigned NextID;
+
----------------
You probably need to reset NextID to zero with each run of the pass. Better though make it a normal member of the Pass class itself.

================
Comment at: llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp:1148
+            TII->buildExtractSubReg(Result, *MRI, MI->getOperand(1), SrcRC,
+                                   TRI->getSubRegFromChannel(i), &AMDGPU::VGPR_32RegClass);
+        Register PartialDst =
----------------
Run clang-format again?

================
Comment at: llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp:1134
+      else
+        MIB.addReg(SrcReg);
+    } else {
----------------
alex-t wrote:
> rampitec wrote:
> > What happens to 16 bit subregs?
> VGPR to SGPR copies are inserted by InstrEmitter to adjust the VALU result to the SALU consumer.
> The 16bits in VGPR result are packed and adjusted to the consumer by inserting the EXCTRACT_ELEMENT lowered in another place.
> What kind of adjustment would you recommend if we have a 16bit VGPR source?
> Zero-extend it to 32bit?
> 
Assume the input like:
```
%0:SGPR_LO16 = COPY %1.lo16:VGPR_32
```
If I read it right it will produce V_READFIRSTLANE_B32 with a 16 bit destination and source, which does not work. Assume that selection managed to produce such input, which path will it take here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128252/new/

https://reviews.llvm.org/D128252