[PATCH] D51542: [X86] Remove wrong ReadAdvance from multiclass sse_fp_unop_s

Fri Aug 31 16:13:07 PDT 2018

andreadb added a comment.

In https://reviews.llvm.org/D51542#1221366, @andreadb wrote:

> Hi Sanjay,
>
> Thanks for the feedback.
>
> In https://reviews.llvm.org/D51542#1221074, @spatel wrote:
>
> > I think this requires an understanding of the intent of ReadAfterLd:
> >
> >   // Instructions with folded loads need to read the memory operand immediately,
> >   // but other register operands don't have to be read until the load is ready.
> >   // These operands are marked with ReadAfterLd.
> >
> >
> > ...that https://reviews.llvm.org/D51534 did not. That's because a broadcast only has one source operand, so ReadAfterLd doesn't even make sense on that instruction?
>
>
> Not only it didn't make any sense. It was even harmful because it was decreasing the use latency of the register used as the base address for the folded load by 'ReadAfterLd' cycles..
>
> > In this case, we have 2 source operands:
> > 
> > 1. The loaded value that we're doing the math on.
> > 2. The unchanging vector lanes of the second source (destination) register.
>
> I think you are getting confused. These are not instructions with 2 input operands. 
>  These are just SSE1/SSE2 unary operations (one def, and one use; see below the tablegen definition).

Sorry. You are right. The output register is only partially updated.
Forget about my other comment on the register renamer; if the processor doesn't keep track of XMM registers whose upper portions have been cleared to zeros, then the instruction has to wait on the previous value of the destination register.
This is quite tricky to model for llvm-mca without adding extra domain knowledge.

That being said, the ReadAfterLd is still wrong as it applies to the register containing the base address of $src1.

https://reviews.llvm.org/D51542