[PATCH] D36840: [DAG] convert vector select-of-constants to logic/math
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 17 13:29:42 PDT 2017
spatel added a comment.
>> For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs
>
> Are you sure?
>
> Bulldozer, Piledriver, Ryzen, and Skylake seem to list PBLENDVB and BLENDVPS as 1 uop.
Oops...maybe it's not so clear anymore. Page references to the doc dated May 2, 2017:
Bulldozer, Piledriver: 1 uop and 2 cycle latency for 128-bit; 2 uops and 2 cycles for 256-bit. (p. 48, 61)
Ryzen: 1 uop and 1 cycle for 128-bit; 2 uops and 1 cycle for 256-bit. (p. 88)
Skylake: 1 uop and 1 cycle for 128-bit (or just legacy encoded xmm?); 2 uops and 2 cycles for 256-bit (or any vex version?) (p. 239, 243)
So yes, it seems the recent uarch are putting more effort into making blendv a fast op. I think logic ops are still the better choice for default x86 (~SandyBridge). As I suggested in the description, we could re-form a select in machine combiner or some other machine pass if that's the better choice for a particular uarch. WDYT?
https://reviews.llvm.org/D36840
More information about the llvm-commits
mailing list