[PATCH] D37418: [X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation.

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Sep 3 09:07:41 PDT 2017


spatel added a comment.

In https://reviews.llvm.org/D37418#859821, @zvi wrote:

> The code change LGTM, but i'm not sure about the profitability. On recent Intel processors (not sure about AMD), 'or' can be executed at double throughput than 'bt'. Assuming the extra constant move being saved is hoisted out of a hot loop, are we gaining or losing?
>  For -Os this should be fine.


I suspect AMD CPUs would want the same output as Intel big cores on these. For example, Agner shows 2 uops for btc on Ryzen, so it has higher latency and less throughput than the plain logic ops. The extra uop is caused by the partial flags update of these instructions?

This is another example where we can't determine profitability in isel. So we should choose whatever we think helps the most common case there. Then, add an MI transform to alter it if we can find profit using uarch machine models and machine trace metrics. If the btc variant always reduces register pressure, that could be a good reason to choose it in isel?


https://reviews.llvm.org/D37418





More information about the llvm-commits mailing list