[PATCH] D49243: [X86] Improved sched models for X86 BT*rr instructions

Clement Courbet via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 18 08:22:58 PDT 2018


courbet added inline comments.


================
Comment at: lib/Target/X86/X86SchedHaswell.td:632
 }
 def : InstRW<[HWWriteBTRSCmr], (instregex "BT(R|S|C)(16|32|64)mr")>;
 
----------------
craig.topper wrote:
> RKSimon wrote:
> > @craig.topper @courbet @gchatelet  These look completely wrong (and BTmr above) - and Broadwell appears to be missing them as well - any suggestions for the bit tests memory cases?
> Skylake doesn't even have an InstRW for them.
> 
> They're also missing from the copy of the database used by IACA that I have. I believe that's where Gadi got most of the info from. I wonder what IACA does if you feed it those instructions.
I can't tell for latencies because we do not support memory operands yet.

For uops, I have working support in this patch: 
https://reviews.llvm.org/D48935

On haswell, this gives:

```
---
mode:            uops
key:             
  instructions:    
    - 'BTC64mr RDI i_0x1x  i_0x0x  R9'
    - 'BTC64mr RDI i_0x1x  i_0x64x  RBX'
    - 'BTC64mr RDI i_0x1x  i_0x128x  RSI'
    - 'BTC64mr RDI i_0x1x  i_0x192x  RCX'
    - 'BTC64mr RDI i_0x1x  i_0x256x  R8'
    - 'BTC64mr RDI i_0x1x  i_0x320x  RDX'
  config:          ''
cpu_name:        haswell
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:    
  - { key: '3', value: 1.3771, debug_string: HWPort0 }
  - { key: '4', value: 1.8848, debug_string: HWPort1 }
  - { key: '5', value: 1.3687, debug_string: HWPort2 }
  - { key: '6', value: 0.728, debug_string: HWPort3 }
  - { key: '7', value: 1.0025, debug_string: HWPort4 }
  - { key: '8', value: 1.6272, debug_string: HWPort5 }
  - { key: '9', value: 2.1307, debug_string: HWPort6 }
  - { key: '10', value: 0.0002, debug_string: HWPort7 }
error:           ''
info:            instruction is parallel, repeating a random one.
assembled_snippet: 5349C7C10100000048C7C30100000048C7C60100000048C7C10100000049C7C00100000048C7C2010000004C0FBB0F480FBB5F40480FBBB780000000480FBB8FC00000004C0FBB8700010000480FBB97400100004C0FBB0F480FBB5F40480FBBB780000000480FBB8FC00000004C0FBB8700010000480FBB97400100004C0FBB0F480FBB5F40480FBBB780000000480FBB8FC00000004C0FBB8700010000480FBB97400100004C0FBB0F480FBB5F40480FBBB780000000480FBB8FC00000005BC3
...
```

Other instructions are similar.

This is a bit noisy unfortunately. This looks like 2*P23 (or maybe P23 + P237, P7 being unused for some reason ?) + 7*P0156 + P4.




https://reviews.llvm.org/D49243





More information about the llvm-commits mailing list