[PATCH] D118617: [AArch64][SVE] Remove false register dependency for unary FP convert operations

Wed Feb 2 04:49:38 PST 2022

peterwaller-arm added a comment.

Getting there, thanks for the improvements. I've picked up on a few more things, but I think this is close to ready.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1684

-  // Floating-point -> signed integer
-  def : Pat<(nxv2f16 (AArch64scvtf_mt (nxv2i1 PPR:$Pg),
----------------
This comment appears to have been dropped.

However, I note that it appears to be misleading/wrong. To me, it reads 'convert floating point to integer', but the conversion is taking a signed integer as input and producing a floating point.

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:2596
+multiclass sve2_fp_convert_narrow<ValueType vtd, SDPatternOperator op,
+                                            ValueType vtp, ValueType vtint_extend,
+                                            dag imm, Instruction inst> {
----------------
An ask, arising from the name `vtint_extend`: please rename this to `vtint_wide` and `imm` => `imm_mask_wide`.

This is a narrowing rather than an extension. And the purpose of this type (and the related immediate `imm`) is to have a mask which selects only the desired bits from the input. Effectively, (as an example case), the patterns are producing a convert of an nxv2i16 to to an nxv2f16, but since there are only two of them, it's necessary to  first put them in an nxv2f64 register. The extension operation translates from nxv2i16 to nxv2i64, for sign extension it's an sext, but for unsigned widening it masks out the higher bits with the and instruction. But since we're starting out with an nxv2i16 in the first place, and we know the result will only be used as an nxv2f16, we know those bits being masked won't be used and it can be dropped.

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:2631-2633
+  def _UNDEF : PredOneOpPassthruPseudo<NAME, !cast<ZPRRegOp>(i_zprtype)>;
   def : SVE_1_Op_Passthru_Pat<vt1, ir_op, vt2, vt3, !cast<Instruction>(NAME)>;
+  defm : SVE_1_Op_PassthruUndef_Pat<vt1, ir_op, vt2, vt3, !cast<Instruction>(NAME # _UNDEF)>;
----------------
(see later comment for rationale)

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:2653-2655
+  def _UNDEF : PredOneOpPassthruPseudo<NAME, !cast<ZPRRegOp>(i_zprtype)>;
   def : SVE_1_Op_Passthru_Round_Pat<vt1, ir_op, vt2, vt3, !cast<Instruction>(NAME)>;
+  defm : SVE_1_Op_PassthruUndef_Round_Pat<vt1, ir_op, vt2, vt3, !cast<Instruction>(NAME # _UNDEF)>;
----------------
Nit. Suggestion to reorder these. Typically we see `multiclass { ... instructions ... ; ... patterns ... }` or `multiclass { ... instruction1 ... patterns1 ...; ... instructions2 ...; ... patterns2 ...; }` but here they are interleaved `(ins1, pat1, ins2, pat1, pat2)`. This could be subtly confusing, if someone is expecting the `SVE_1_Op_Passthru_Round_Pat` to correspond to the `_UNDEF` definition immediately above. More linebreaks also help here.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fcvt.ll:1382
+  %splat = shufflevector <vscale x 2 x i64> %ins, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
+  %and = and <vscale x 2 x i64> %b, %splat
+  %res = call <vscale x 2 x double> @llvm.aarch64.sve.ucvtf.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x i1> %pg, <vscale x 2 x i64> %and)
----------------
It should be possible to exercise the relevant pattern without an explicit IR and, by analogy with `ucvtf_stod_movprfx`. The patterns of interest (involving and/sxtw, etc) exist to produce better code in the presence of unpacked types.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118617/new/

https://reviews.llvm.org/D118617