[PATCH] D55163: AMDGPU: Add optimization patterns to combine fp32->fp16 conversions

Mon Dec 17 13:24:48 PST 2018

pendingchaos added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2154-2159
+  unsigned shiftOpcode = hi ? ISD::SHL : ISD::SRL;
+  int shiftOperand = hi ? 0 : 1;
+  uint32_t andMask = hi ? 0xffff0000u : 0xffffu;
+  int andOperand = hi ? 1 : 0;
+
+  if (In.getOpcode() == ISD::AND) {
----------------
arsenm wrote:
> computeKnownBits?
I don't see how that would help with obtaining the source of the low/high 16 bits (the cvt_pkrtz node) and end up make the code more generic/smaller? Since it would still have to match for "and(cvt_pkrtz(v, ), 0xffff)" and such.

I just realized that this function could handle cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) (without any ands or shifts). Should I make it so (and do something similar for SelectCvtRtzF16F32)?

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:1597

+let SubtargetPredicate = isGCN in {
+
----------------
arsenm wrote:
> Should use isVI, or maybe these should be distinguished by GCN3Encoding? Needs a comment for why these are separated 
Since all GCN versions support the VOP3a form, shouldn't it use isGCN (to combine the modifiers into the instruction on SI/CI)?

The VOP2 form is only supported on SI/CI, so isSICI is used. IIRC VOP2 ended up being used when no modifiers could be folded.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55163/new/

https://reviews.llvm.org/D55163