[PATCH] D50004: [PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar registers for P9

Fri Aug 10 06:39:51 PDT 2018

jsji added a comment.

In https://reviews.llvm.org/D50004#1195184, @nemanjai wrote:

> In https://reviews.llvm.org/D50004#1194086, @jsji wrote:
>
> >
>
>
> <snip>
>
> This is exactly what I was referring to... The situation you describe is not analogous to the situation described in the ISA. According to the ISA, the sequence that will be optimized is:
>
>   xxlor XC, XT, XT
>   xxperm XT, XA, XB
>
>
> So in this case, the only way we would get the optimized behaviour would be if the "pre-patch" code sequence was:
>
>   xxlor v3, f1, f1
>   xsmaddadp f1, v2, f0
>
>
> And I'm fairly certain that without source forwarding of the copy, we can never produce such code (but of course, I could be wrong).

??? Can you please double check what ISA are you referring to?

The description in PowerISA_public.v3.0B https://ibm.ent.box.com/s/1hzcwkwf8rbju5h9iyf44wm94amnlcrv is:

As an example, to preserve the XT source register in the xxperm instruction, the following sequence will optimize performance.

  xxlor XT,XC,XC /* Copy (XC) to XT
  xxperm XT,XA,XB /* Permute, overwriting XT

The set of instructions listed below, when immediately preceded by the xxlor XT,XC,XC instruction in a sequence similar to the above example, will provide optimal performance.

This should be exact the same pattern as in my example: xxlor XT,XC,XC to Copy (XC) to XT, not xxlor XC,XT,XT in your description.

https://reviews.llvm.org/D50004