[PATCH] D71252: [AArch64][SVE] Add intrnisics for saturating scalar arithmetic

Thu Dec 19 14:30:54 PST 2019

efriedma accepted this revision.
efriedma added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:700
+  def : Pat<(i32 (op GPR32:$Rn, (sve_pred_enum:$pattern), (sve_incdec_imm:$imm4))),
+            (EXTRACT_SUBREG (!cast<Instruction>(NAME) (INSERT_SUBREG (IMPLICIT_DEF), $Rn, sub_32), sve_pred_enum:$pattern, sve_incdec_imm:$imm4), sub_32)>;
 }
----------------
andwar wrote:
> efriedma wrote:
> > andwar wrote:
> > > efriedma wrote:
> > > > SQDECB always returns a value in a 64-bit register; why are you treating the return value as 32 bits? Even if there's some reason to prefer that form at the IR level, it doesn't seem like a good idea in isel; if you need a sign-extended value, you'll be forced to emit a redundant sign extension.
> > > There's a 64-bit and 32-bit variant of `SQDECB`. This pattern is for the 32-bit variant, which returns 32-bit as well 64-bit result. Here we only care about the 32-bit result (because that's what the ACLE intrinsic returns). More specifically, this is meant to allow 1:1 mapping between:
> > > * `int32_t svqdecb_n_s32(int32_t op, uint64_t imm_factor)` from ACLE
> > > * `declare i32 @llvm.aarch64.sve.sqdecb.n32(i32, i32, i32)` IR intrinsic
> > > * `sqdecb x0, w0, vl3, mul #4` SVE instruction
> > > 
> > > For the 64-bit variant there's a different intrinsic:
> > > * `int64_t svqdecb_n_s64(int64_t op, uint64_t imm_factor)` from ACLE
> > > * `declare i64 @llvm.aarch64.sve.sqdecb.n64(i64, i32, i32)` IR intrinsic
> > > * `sqdecb x0, vl4, mul #5` SVE instruction
> > > 
> > > Also, this multiclass is only used for the intrnisics.
> > Consider something like the following:
> > 
> > ```
> > long x(int z) { return svqdecb_n_s32(z, 1);
> > ```
> > 
> > This function should lower to just a single sqdecb.  The way this is written you end up with an unnecessary sxtw.
> I will add extra patterns to cater for this scenario (please check the next patch). The other option would be to rewrite this pattern so that the return value is always `i64` and then add some new ISD nodes and truncate the user requests `i32`. But the overall effect would be similar.
Okay, that works.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71252/new/

https://reviews.llvm.org/D71252