[PATCH] D140087: [X86] Replace (31/63 -/^ X) with (NOT X) and ignore (32/64 ^ X) when computing shift count

Noah Goldstein via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 20 11:05:45 PST 2022


goldstein.w.n added a comment.

In D140087#4008536 <https://reviews.llvm.org/D140087#4008536>, @craig.topper wrote:

> In D140087#4008447 <https://reviews.llvm.org/D140087#4008447>, @goldstein.w.n wrote:
>
>> @pengfei Somewhat unrelated so if this is not the right place the ask, can you let me know where is.
>>
>> I was looking to add a peephole to change something like:
>>
>>   ptr[x / 32] |= (1 << (x % 32))
>>
>> Currently codegen is something like:
>>
>>   mov    $0x1,%gpr1
>>   shlx   %cnt,%gpr1,%mask
>>   shr    $0x5,%cnt
>>   or  %mask, (%ptr, %cnt, 4)
>>
>> And it could be as simple as:
>>
>>   bts %cnt, (%ptr)
>>
>> (other pattern with `bt{s|r|c}` could also be improved)
>>
>> I saw `one_bit_patterns` in `X86InstrCompiler` but don't see a way to extend
>> the peephole s.t `addr` is a function of the inputs and not just one of the inputs.
>>
>> Any chance you could direct me as where I should look at add this type of
>> peephole?
>
> `bts %cnt, (%ptr)` is a 10 or 11 uop instruction. It might not be better than current code.

I think that translates to worse throughput (so worse in a tight loop iff no carried
dependency (better latency so if carried dependency still preferable)) but outside
of that once case have to imagine its a win.

1. Better latency.
2. Less register pressure
3. Less code size.
4. Less Backend resources(unless this is some bizarre program thats retirement bound)

on ICX:
Loop using `shlx` method with hoisted `movl $1, %gpr`. 1,000,000 iterations (with a `decl; jne` for loop impl)

   3,782,331      port0                                                          
   3,207,023      port1                                                          
   1,001,220      port23                                                         
   3,216,022      port5                                                          
   4,940,975      port6                                                          
  11,575,101      port49                                                         

Same loop using `btr`

  2,055,213      port0                                                          
  1,298,859      port1                                                          
  1,000,372      port23                                                         
  1,505,077      port5                                                          
  3,261,176      port6                                                          
  1,088,049      port49                                                         

Feels like a win.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140087/new/

https://reviews.llvm.org/D140087



More information about the llvm-commits mailing list