[PATCH] D128252: [AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.

Wed Jul 13 11:05:10 PDT 2022

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp:1134
+      else
+        MIB.addReg(SrcReg);
+    } else {
----------------
rampitec wrote:
> alex-t wrote:
> > rampitec wrote:
> > > What happens to 16 bit subregs?
> > VGPR to SGPR copies are inserted by InstrEmitter to adjust the VALU result to the SALU consumer.
> > The 16bits in VGPR result are packed and adjusted to the consumer by inserting the EXCTRACT_ELEMENT lowered in another place.
> > What kind of adjustment would you recommend if we have a 16bit VGPR source?
> > Zero-extend it to 32bit?
> > 
> Assume the input like:
> ```
> %0:SGPR_LO16 = COPY %1.lo16:VGPR_32
> ```
> If I read it right it will produce V_READFIRSTLANE_B32 with a 16 bit destination and source, which does not work. Assume that selection managed to produce such input, which path will it take here?
JBTW, right now it seems to go via moveToVALU:

```
# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=si-fix-sgpr-copies -verify-machineinstrs -o - %s

---
name:            v16_to_s16
body:             |
  bb.0:
    %0:vgpr_32 = IMPLICIT_DEF
    %1:sgpr_lo16 = COPY %0.lo16:vgpr_32
    %2:vgpr_lo16 = COPY %1
    S_ENDPGM 0, implicit %1, implicit %2
...
```
Results in:
```
    %0:vgpr_32 = IMPLICIT_DEF
    %3:vgpr_lo16 = COPY %0.lo16
    %2:vgpr_lo16 = COPY %3
    S_ENDPGM 0, implicit %3, implicit %2
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128252/new/

https://reviews.llvm.org/D128252