[PATCH] D50004: [PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar registers for P9

Thu Aug 9 11:00:09 PDT 2018

jsji added a comment.

In https://reviews.llvm.org/D50004#1192390, @nemanjai wrote:

> In https://reviews.llvm.org/D50004#1187625, @jsji wrote:
>
> >
>
>
> <snip>
>
> > No sure how much this will have impact, but maybe we need to consider still using xxlor for destructive instructions?
> > 
> > eg:
> >  In Power ISA 3.0 B,  2.1.5 Destructive Operation Operand Preservation
> >  "The set of instructions listed below, when immediately preceded by the xxlor XT,XC,XC instruction in a sequence similar to the above example, **will provide optimal performance.**"
>
> I don't think there are any conditions under which we will emit an `xxlor` that will be eligible for this. That may be a good candidate to peephole and/or fuse together.
> Example:
>
>   vector double test(double a, vector double b, vector double c, vector double *s) {
>     vector double n = (vector double)a;
>     *s = n + c * b;
>     return n;
>   }
>
>
> Is about as close as you can get, but will produce the following on Power9:
>
>   xxspltd vs0, vs1, 0
>   xxlor vs1, vs0, vs0
>   xvmaddadp vs1, vs35, vs34
>   xxlor vs34, vs0, vs0
>   stxv vs1, 0(r9)
>   
>
> Ultimately, the target of the copy will always be used as an input to the destructive operation. If we want to exploit this optimization in the HW, we'd have to forward the source of the copy (and eliminate the second copy in this case as well). But if we're consciously transforming the code to exploit this, the instruction we use for the COPY is immaterial (we can always transform it to `XXLOR` at the time).


FYI.

An ugly example to show that the there are situations that this change will have impact to  destructive operations.

$ cat t1.c
double test(double n0, double n1, double n2, double n3, double n4, double n5, double n6, double n7, double n8, double n9,
double n10, double n11, double n12, double n13, double n14, double n15, double n16, double n17, double n18, double n19,
double n20, double n21, double n22, double n23, double n24, double n25, double n26, double n27, double n28, double n29,
double n30, double n31, double n32, double n33,
 double c, double b, double *s) {

  *s = n0 + c * b;
  *s += n1 + *s * b;
  *s += n2 + *s * b;
  *s += n3 + *s * b;
  *s += n4 + *s * b;
  *s += n5+ *s * b;
  *s += n6+ *s * b;
  *s += n7+ *s * b;
  *s += n8+ *s * b;
  *s += n9+ *s * b;
  *s += n10 + *s * b;
  *s += n11 + *s * b;
  *s += n12 + *s * b;
  *s += n13 + *s * b;
  *s += n14 + *s * b;
  *s += n15+ *s * b;
  *s += n16+ *s * b;
  *s += n17+ *s * b;
  *s += n18+ *s * b;
  *s += n19+ *s * b;
  *s += n20 + *s * b;
  *s += n21 + *s * b;
  *s += n22 + *s * b;
  *s += n23 + *s * b;
  *s += n24 + *s * b;
  *s += n25+ *s * b;
  *s += n26+ *s * b;
  *s += n27+ *s * b;
  *s += n28+ *s * b;
  *s += n29+ *s * b;
  *s += n30 + *s * b;
  *s += n31 + *s * b;
  *s += n32 + *s * b;
  *s += n33 + *s * b;
  return n0;

}

clang -S -mcpu=pwr9 -O2 -ffast-math t1.c -mllvm -ppc-vsr-nums-as-vr -mllvm -ppc-asm-full-reg-names -mllvm -enable-post-misched=false

diff  of assembly before change and after change:

$ diff -Naur before.s after.s

- before.s    2018-08-09 13:52:24.785246846 -0400

+++ after.s     2018-08-09 13:59:38.815493708 -0400
@@ -9,7 +9,7 @@

  1. %bb.0:                                # %entry lfd f0, 304(r1) lxsd v2, 312(r1)
- xxlor v3, f1, f1

+      xscpsgndp v3, f1, f1

  xsmaddadp v3, v2, f0
  xsadddp f0, v3, f2
  xsmaddadp f0, v3, v2


https://reviews.llvm.org/D50004