[PATCH] D51542: [X86] Remove wrong ReadAdvance from multiclass sse_fp_unop_s
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 31 10:43:30 PDT 2018
spatel added a comment.
I think this requires an understanding of the intent of ReadAfterLd:
// Instructions with folded loads need to read the memory operand immediately,
// but other register operands don't have to be read until the load is ready.
// These operands are marked with ReadAfterLd.
...that https://reviews.llvm.org/D51534 did not. That's because a broadcast only has one source operand, so ReadAfterLd doesn't even make sense on that instruction?
In this case, we have 2 source operands:
1. The loaded value that we're doing the math on.
2. The unchanging vector lanes of the second source (destination) register.
The patch has the intended effect of making the math op depend on the load operand, but it's not clear to me what is or should be happening in a case like this on skylake:
Trunk:
[0,0] DeeeeeeeeeER. dppd $1, %xmm1, %xmm2 <--- long latency, but pipelined
[0,1] D=eE-------R. leaq 8(%rsp,%rdi,2), %rax
[0,2] D=eeeeeeeeeER rsqrtss (%rax), %xmm2 <--- wrong: this can't start executing before %rax is loaded
Apply this patch (remove ReadAfterLd:)
[0,0] DeeeeeeeeeER . dppd $1, %xmm1, %xmm2
[0,1] D=eE-------R . leaq 8(%rsp,%rdi,2), %rax
[0,2] D==eeeeeeeeeER rsqrtss (%rax), %xmm2 <--- is this right? the calc can begin before xmm2 is known?
But with AVX the 2nd source is explicit, and ReadAfterLd has a different effect:
[0,0] DeeeeeeeeeER . vdppd $1, %xmm0, %xmm1, %xmm2
[0,1] D=eE-------R . leaq 8(%rsp,%rdi,2), %rax
[0,2] D====eeeeeeeeeER vrsqrtss (%rax), %xmm2, %xmm3 <--- execution delayed by vdppd?
No ReadAfterLd:
[0,0] DeeeeeeeeeER . . vdppd $1, %xmm0, %xmm1, %xmm2
[0,1] D=eE-------R . . leaq 8(%rsp,%rdi,2), %rax
[0,2] D=========eeeeeeeeeER vrsqrtss (%rax), %xmm2, %xmm3 <--- execution delayed until xmm2 is known
https://reviews.llvm.org/D51542
More information about the llvm-commits
mailing list