[PATCH] D140087: [X86] Replace (31/63 -/^ X) with (NOT X) and ignore (32/64 ^ X) when computing shift count
Noah Goldstein via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 20 12:36:59 PST 2022
goldstein.w.n added a comment.
In D140087#4008802 <https://reviews.llvm.org/D140087#4008802>, @goldstein.w.n wrote:
> In D140087#4008747 <https://reviews.llvm.org/D140087#4008747>, @craig.topper wrote:
>
>>>> 3,782,331 port0
>>>> 3,207,023 port1
>>>> 1,001,220 port23
>>>> 3,216,022 port5
>>>> 4,940,975 port6
>>>> 11,575,101 port49
>>
>> I'm having trouble accounting for these numbers. As far as I know
>> shlx is 1 uop
>> mov is 1 uop
>> shr is 1 uop
>> and with load+store is 4 uops
>> dec is 1 uop
>> bnz is 1uop and could possibly be macrofused with the dec.
>>
>> so that's 9 or maybe 8 with macrofusion uops per iteration. what am I missing?
>>
>> You're also missing counts on port 7 and 8 which is where the store AGU uops should go. The port 4 and 9 would be the store data uops.
>
> See my other comment, but I think the benchmark was misleading and the numbers
> where dramatically skewed by uop replay (maybe something else, but thats the only
> thing I can think of at the moment).
>
> Think you and @lebedev.ri are right and unless there is intense register pressure
> or `-Os` it doesn't win out.
If its all the same, would still like guidance about how to implement it. Think it may
be useful at the very least for `AtomicExpansionKind::BitTestIntrinsic`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140087/new/
https://reviews.llvm.org/D140087
More information about the llvm-commits
mailing list