[PATCH] D34336: [x86] transform vector inc/dec to use -1 constant (PR33483)

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 19 08:00:33 PDT 2017


spatel added a comment.

In https://reviews.llvm.org/D34336#783833, @RKSimon wrote:

> In https://reviews.llvm.org/D34336#783558, @craig.topper wrote:
>
> > vpternlog does not have any idiom recognition.
> >
> > For pcmpeq I think intel only avoids the dependency but still executes it. What does AMD do?
>
>
> Confirmed with Agner's docs - Jaguar/Bulldozer/Ryzen all avoid input register dependencies for PCMPEQ/PCMPGT/PSUB/XOR/ANDN simd instructions when the two inputs are the same. They still create+execute uops (unlike move elimination) - but as integer ops can often go down most simd pipes and are low latency they are very unlikely to cause a performance regression.


Ah, right - I was confusing move elimination with input reg elimination. In any case, I think there are a few reasons to favor this form in the DAG:

1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but...
2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar (I'll attach a test program next), but...
3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16-64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too.
4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass).



> IMO we are much better off with this approach than the constant load, although the higher register pressure is a minor concern.




https://reviews.llvm.org/D34336





More information about the llvm-commits mailing list