[PATCH] D46023: [AArch64][SVE] Asm: Support for gather LD1/LDFF1 (scalar + vector) load instructions.

Wed Apr 25 02:10:59 PDT 2018

sdesmalen added inline comments.

================
Comment at: lib/Target/AArch64/SVEInstrFormats.td:764
+
+//===----------------------------------------------------------------------===//
+// SVE Memory - 32-bit Gather and Unsized Contiguous Group
----------------
fhahn wrote:
> Thanks for restructuring this Sander, it is easier to get a completer picture now! I think there are still some repetitive classes. I think it would be possible to have a single class dealing with 32 and 64 bit gather & unsized operands, like the diff below. The parameters of `sve_mem_gld` would need documenting (and one or 2 might be implied by others).
> 
> 
> ```
> 
> +class sve_mem_gld<bits<4> opc, bit xs, bit scaled, bit ns, bits<7> type, RegisterOperand sz, string asm,
> +                         RegisterOperand zprext>
> +: I<(outs sz:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, zprext:$Zm),
> +  asm, "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +  "",
> +  []>, Sched<[]> {
> +  bits<3> Pg;
> +  bits<5> Rn;
> +  bits<5> Zm;
> +  bits<5> Zt;
> +  let Inst{31-25} = type;
> +  let Inst{24-23} = opc{3-2};
> +  let Inst{22}    = xs;
> +  let Inst{21}    = scaled;
> +  let Inst{20-16} = Zm;
> +  let Inst{15}    = ns;
> +  let Inst{14-13} = opc{1-0};
> +  let Inst{12-10} = Pg;
> +  let Inst{9-5}   = Rn;
> +  let Inst{4-0}   = Zt;
> +
> +  let mayLoad = 1;
> +  let Defs = !if(!eq(opc{0}, 1), [FFR], []);
> +  let Uses = !if(!eq(opc{0}, 1), [FFR], []);
> +}
> +
> +//===----------------------------------------------------------------------===//
> +// SVE Memory - 32-bit Gather and Unsized Contiguous Group
> +//===----------------------------------------------------------------------===//
> +multiclass sve_mem_32b_gld_sv_32_scaled<bits<4> opc, string asm,
> +                                        RegisterOperand sxtw_opnd,
> +                                        RegisterOperand uxtw_opnd> {
> +  def _UXTW_SCALED_REAL : sve_mem_gld<opc, 0, 1, 0, 0b1000010, Z_s, asm, uxtw_opnd>;
> +  def _SXTW_SCALED_REAL : sve_mem_gld<opc, 1, 1, 0, 0b1000010, Z_s, asm, sxtw_opnd>;
> +
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _UXTW_SCALED_REAL) ZPR32:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, uxtw_opnd:$Zm), 0>;
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _SXTW_SCALED_REAL) ZPR32:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, sxtw_opnd:$Zm), 0>;
> +}
> +
> +multiclass sve_mem_32b_gld_vs_32_unscaled<bits<4> opc, string asm,
> +                                          RegisterOperand sxtw_opnd,
> +                                          RegisterOperand uxtw_opnd> {
> +  def _UXTW_REAL : sve_mem_gld<opc, 0, 0, 0, 0b1000010, Z_s, asm, uxtw_opnd>;
> +  def _SXTW_REAL : sve_mem_gld<opc, 1, 0, 0, 0b1000010, Z_s, asm, sxtw_opnd>;
> +
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _UXTW_REAL) ZPR32:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, uxtw_opnd:$Zm), 0>;
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _SXTW_REAL) ZPR32:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, sxtw_opnd:$Zm), 0>;
> +}
> +
> +
> +//===----------------------------------------------------------------------===//
> +// SVE Memory - 64-bit Gather Group
> +//===----------------------------------------------------------------------===//
> +
> +multiclass sve_mem_64b_gld_sv_32_scaled<bits<4> opc, string asm,
> +                                        RegisterOperand sxtw_opnd,
> +                                        RegisterOperand uxtw_opnd> {
> +  def _UXTW_SCALED_REAL : sve_mem_gld<opc, 0, 1, 0, 0b1100010, Z_d, asm, uxtw_opnd>;
> +  def _SXTW_SCALED_REAL : sve_mem_gld<opc, 1, 1, 0, 0b1100010, Z_d, asm, sxtw_opnd>;
> +
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _UXTW_SCALED_REAL) ZPR64:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, uxtw_opnd:$Zm), 0>;
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _SXTW_SCALED_REAL) ZPR64:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, sxtw_opnd:$Zm), 0>;
> +}
> +
> +multiclass sve_mem_64b_gld_vs_32_unscaled<bits<4> opc, string asm,
> +                                          RegisterOperand sxtw_opnd,
> +                                          RegisterOperand uxtw_opnd> {
> +  def _UXTW_REAL : sve_mem_gld<opc, 0, 0, 0, 0b1100010, Z_d, asm, uxtw_opnd>;
> +  def _SXTW_REAL : sve_mem_gld<opc, 1, 0, 0, 0b1100010, Z_d, asm, sxtw_opnd>;
> +
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _UXTW_REAL) ZPR64:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, uxtw_opnd:$Zm), 0>;
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _SXTW_REAL) ZPR64:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, sxtw_opnd:$Zm), 0>;
> +}
> +
> +multiclass sve_mem_64b_gld_sv2_64_scaled<bits<4> opc, string asm,
> +                                         RegisterOperand zprext> {
> +  def _SCALED_REAL : sve_mem_gld<opc, 1, 1, 1, 0b1100010, Z_d, asm, zprext>;
> +
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _SCALED_REAL) ZPR64:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, zprext:$Zm), 0>;
> +}
> +
> +multiclass sve_mem_64b_gld_vs2_64_unscaled<bits<4> opc, string asm> {
> +  def _REAL : sve_mem_gld<opc, 1, 0,  1, 0b1100010, Z_d, asm, ZPR64ExtLSL8>;
> +
> +  def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Zm]",
> +                  (!cast<Instruction>(NAME # _REAL) ZPR64:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, ZPR64ExtLSL8:$Zm), 0>;
> +}
> ```
Thanks for pointing out there is some more simplification we can do, I'll also keep this in mind for future SVE patches (we initially had more classes but after some of the refactoring of operand classes over time and never bothered to simplify it and I hadn't realised we can indeed merge some of these classes together now).

There is a variable threshold as to what parameterisation makes sense and makes the file readible. Personally I think it makes sense to draw the line at the point of having separate classes for different encoding groups as they are defined in the architecture specification. This should also make it easier to reference in the spec.

I have updated the patch to combine the two 64-bit gather classes (lsl vs uxtw/sxtw), but still maintaing the base instruction classes from the different encoding groups. Do you think that approach is sufficient?

https://reviews.llvm.org/D46023