[PATCH] D55163: AMDGPU: Add optimization patterns to combine fp32->fp16 conversions
Rhys Perry via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 17 13:24:48 PST 2018
pendingchaos added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2154-2159
+ unsigned shiftOpcode = hi ? ISD::SHL : ISD::SRL;
+ int shiftOperand = hi ? 0 : 1;
+ uint32_t andMask = hi ? 0xffff0000u : 0xffffu;
+ int andOperand = hi ? 1 : 0;
+
+ if (In.getOpcode() == ISD::AND) {
----------------
arsenm wrote:
> computeKnownBits?
I don't see how that would help with obtaining the source of the low/high 16 bits (the cvt_pkrtz node) and end up make the code more generic/smaller? Since it would still have to match for "and(cvt_pkrtz(v, ), 0xffff)" and such.
I just realized that this function could handle cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) (without any ands or shifts). Should I make it so (and do something similar for SelectCvtRtzF16F32)?
================
Comment at: lib/Target/AMDGPU/SIInstructions.td:1597
+let SubtargetPredicate = isGCN in {
+
----------------
arsenm wrote:
> Should use isVI, or maybe these should be distinguished by GCN3Encoding? Needs a comment for why these are separated
Since all GCN versions support the VOP3a form, shouldn't it use isGCN (to combine the modifiers into the instruction on SI/CI)?
The VOP2 form is only supported on SI/CI, so isSICI is used. IIRC VOP2 ended up being used when no modifiers could be folded.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55163/new/
https://reviews.llvm.org/D55163
More information about the llvm-commits
mailing list