<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/153983>153983</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86][BreakFalseDeps] Some x86 cpus only have popcnt false dep for popcnt r16, r/m16
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sharkautarch
      </td>
    </tr>
</table>

<pre>
    [uops.info ](https://uops.info/table.html) seems to show that for some cpus, popcnt only has a false dependency on the destination register for 16bit reg/mem size (note that on uops.info, popcnt is listed under the sse category, and not the base, other or bmi category)
Where clicking on the latency listing for `popcnt r/m32` and `popcnt r/m64` shows 0 latency for operand 1 -> operand 1 (where operand 1 is the dest register)

uops.info shows this behavior for these cpus:
- Cannon Lake
- Ice Lake
- Tiger Lake
- Rocket Lake
- Alder Lake-P
- Alder Lake-E
- Goldmont Plus
- Tremont
- Zen+ & Zen2-4 (tho I don't think any of the zen cpu tunings in X86.td have TuningPOPCNTFalseDeps currently, so not sure on the relevance on that)

I double checked this for my Alderlake cpu with llvm-exegesis:

### 64bit -- 1 cycle:
`printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'popcnt %rax, %rcx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
---
mode:            latency
key:
 instructions:
    - 'POPCNT64rr RCX RAX'
  config:          ''
 register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 1.0212, per_snippet_value: 1.0212, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B90200000000000000F3480FB8C8F3480FB8C8F3480FB8C8F3480FB8C8C3
...
```
### 64bit -- still 1 cycle when xoring dest register rcx:
`printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'xor %ecx, %ecx' 'popcnt %rax, %rcx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
--
mode: latency
key:
  instructions:
    - 'XOR32rr ECX ECX ECX'
    - 'POPCNT64rr RCX RAX'
  config:          ''
  register_initial_values:
 - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name:        alderlake
llvm_triple: x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 0.5202, per_snippet_value: 1.0404, validation_counters: {} }
error:           ''
info:            ''
assembled_snippet: 48B8020000000000000048B9020000000000000031C9F3480FB8C831C9F3480FB8C831C9F3480FB8C831C9F3480FB8C8C3
...
```

---

### 32bit -- 1 cycle:
`printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'popcnt %eax, %ecx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
---
mode:            latency
key:
 instructions:
    - 'POPCNT32rr ECX EAX'
  config:          ''
 register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 1.0208, per_snippet_value: 1.0208, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B90200000000000000F30FB8C8F30FB8C8F30FB8C8F30FB8C8C3
...
```
### 32bit -- still 1 cycle when xoring dest register rcx:
`printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'xor %ecx, %ecx' 'popcnt %eax, %ecx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
---
mode: latency
key:
  instructions:
    - 'XOR32rr ECX ECX ECX'
    - 'POPCNT32rr ECX EAX'
  config:          ''
  register_initial_values:
 - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name:        alderlake
llvm_triple: x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 0.5227, per_snippet_value: 1.0454, validation_counters: {} }
error:           ''
info:            ''
assembled_snippet: 48B8020000000000000048B9020000000000000031C9F30FB8C831C9F30FB8C831C9F30FB8C831C9F30FB8C8C3
...
```

---

### 16 bit -- **3 cycles due to false dependency**:
`printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'popcnt %ax, %cx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
---
mode:            latency
key:
 instructions:
    - 'POPCNT16rr CX AX'
  config:          ''
 register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 3.017, per_snippet_value: 3.017, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B9020000000000000066F30FB8C866F30FB8C866F30FB8C866F30FB8C8C3
...
```
### 16 bit -- back to 1 cycle when xoring destination register rcx:
`printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'xor %ecx, %ecx' 'popcnt %ax, %cx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
---
mode: latency
key:
  instructions:
    - 'XOR32rr ECX ECX ECX'
    - 'POPCNT16rr CX AX'
  config:          ''
  register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name:        alderlake
llvm_triple: x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 0.5194, per_snippet_value: 1.0388, validation_counters: {} }
error:           ''
info:            ''
assembled_snippet: 48B8020000000000000048B9020000000000000031C966F30FB8C831C966F30FB8C831C966F30FB8C831C966F30FB8C8C3
...
```

---

## I also double checked that the alderlake E core has this same behavior as w/ the p cores, using systemd-run to rerun just the 32&16bit popcnt tests on just my E cores (cpus 12-19):
### 32bit -- 1 cycle:
`sudo systemd-run -p AllowedCPUs=12-19 --send-sighup --pty -- printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'popcnt %eax, %ecx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
Running as unit: run-p694886-i694887.service
Press ^] three times within 1s to disconnect TTY.
---
mode: latency
key:
  instructions:
    - 'POPCNT32rr ECX EAX'
  config: ''
  register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 1.0214, per_snippet_value: 1.0214, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B90200000000000000F30FB8C8F30FB8C8F30FB8C8F30FB8C8C3
...
```
### 32bit -- still 1 cycle when xoring dest register rcx:
`sudo systemd-run -p AllowedCPUs=12-19 --send-sighup --pty -- printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'xor %ecx, %ecx' 'popcnt %eax, %ecx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
Running as unit: run-p695328-i695329.service; invocation ID: 94e816d26d6c4a33a8a5efa0899be086
Press ^] three times within 1s to disconnect TTY.
---
mode:            latency
key:
  instructions:
    - 'XOR32rr ECX ECX ECX'
    - 'POPCNT32rr ECX EAX'
  config:          ''
 register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 0.5205, per_snippet_value: 1.041, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B9020000000000000031C9F30FB8C831C9F30FB8C831C9F30FB8C831C9F30FB8C8C3
...
```

---

### 16 bit -- **3 cycles due to false dependency**:
`sudo systemd-run -p AllowedCPUs=12-19 --send-sighup --pty -- printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'popcnt %ax, %cx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
Running as unit: run-p695858-i695859.service
Press ^] three times within 1s to disconnect TTY.
---
mode: latency
key:
  instructions:
    - 'POPCNT16rr CX AX'
  config: ''
  register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 3.0179, per_snippet_value: 3.0179, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B9020000000000000066F30FB8C866F30FB8C866F30FB8C866F30FB8C8C3
...
```
### 16 bit -- back to 1 cycle when xoring destination register rcx:
`sudo systemd-run -p AllowedCPUs=12-19 --send-sighup --pty -- printf '%s\n' '#LLVM-EXEGESIS-DEFREG RAX 2' '#LLVM-EXEGESIS-DEFREG RCX 2' 'xor %ecx, %ecx' 'popcnt %ax, %cx' | llvm-exegesis --mode=latency --repetition-mode=duplicate --snippets-file=-`
```
Running as unit: run-p696642-i696643.service; invocation ID: 9bf18c1f47a645fcbc340206cb3657f3
Press ^] three times within 1s to disconnect TTY.
---
mode:            latency
key:
  instructions:
 - 'XOR32rr ECX ECX ECX'
    - 'POPCNT16rr CX AX'
  config:          ''
 register_initial_values:
    - 'RAX=0x2'
    - 'RCX=0x2'
cpu_name: alderlake
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 0.521, per_snippet_value: 1.042, validation_counters: {} }
error:           ''
info: ''
assembled_snippet: 48B8020000000000000048B9020000000000000031C966F30FB8C831C966F30FB8C831C966F30FB8C831C966F30FB8C8C3
...
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsWl9v2zgS_zTMy0AGRf2x9JAHx44XBXq3Qdq7y91LQEtjixuJFEjKiffTH0jJdpK2TrvbNmp3DRsWZyiSM-RvftSI3BixkYjnJLkgyeKMd7ZS-txUXN_xznJdVGcrVe6cvlOtmQi5VkCSBWFZZW1rSDQjbEnY8qAlbGn5qsZJZZuasBwMYmPAKjCVugdbcQtrpcGoBqFoO0PYHFrVFtKCkvUOKm6Aw5rXBqHEFmWJstiBkmArJzFWSG6FkqBxI4xF7dsL05WwTkTYssEGjPgdgbBMKot9r0rCo1EeOhUGatdMCZ0sUftejEEouMWN0jtXlcsSpLJet-IGnUzZCjUoDatGPKqcEzr7T4UaoahFcSfkZj_0mltvievNid2oSUqHYWg37oiRlPreninS2CmcBw3QQ0uuBdWidjeEEJDo8lGRsOzej-MoEubgw4Pz-hETOjvOb9-NrYSBFVZ8K1TvYluhGeYscncEMOdSKglv-R368psCj4X3YoP6WLxWxR3aY3lWl4M6uHouuPSCX1RdNkpauKo70zep0Qn89f9QEnYBhKXukgWxs9hWCt5AqSRhUzddQt4BlztQa2_57yjd-MF2UsiNASHhJksntoSKbxHee_HVr1fzf75fugW4wNZA0WmN0tZ-JRjlF4LpnGP7edVY45bLYhBwe3CpG0m3qhGKCos7LHufOlc2u97cmt95j8K9sBXU9bYJ8AE3aMTgY_dlUf-FNHZrPAgghGJX1DhUSWmrhbRrIGxKWGJIMnf298Xo7dt__yO4vLn85fLdm3fB4nJ5ffkLXM9ugL1UZ36sM6xFwhLNH5wf3FXx4LXT-dOBQxA0qkQSLfbrNAg0tmiFQ-1eV3ZtLRxsIAiMFG2L1gRr4YxaBCSlvWHDl86CICB01t87g0efoQ9CZ3e46x0CQhqru8J1N7jR1QycIf3kprHW3r7r2Y3zgKtQKLkWmyete-d47R4tt0IKK3h9u-V1h88bd61FC_rA9m3u5fMn8qLtbiVvvCV8vwoInTkv3lot2vpg5EOW3qZx0BZBLWT3EGxk59wg5O0zGyGklDpHNcjd4mxQ2sPwAiDTC-j9c3AYm4M3wt88oSxkPiiivh2m4_Yj6i2vRemj722hOmlR-87J9IJMF0CmC0JnqLXST2fp4Egfe90NewE3BptVjeW-V6eNs4uMMvrkE2cX-XPZMoozurzI5tnpq3lE6GwymTxbUx8BlrGirvfwgvsKJTwo7aL1k6AJbu1_N_A9OKJgCRZ75OGAvFHg8hEsP4bFU2C8-fU6YlrD5fxm_3uGnD8I1xfw-kfBOnxOYPY74ZVOEkZP4jWm8dfE68d0fwq6UTjPjwD9_NKnoLxniCewjtjr8yXy56j9AfnyiNK_-XJPiDQ7yZe9emx8uWfGj_-_yJMHQP24PDkKPH4rovxinP4ViNIZe4Iokx-BKB9T4enrLyLIMIUB0ITNCJtFPaANlB2CVR9kYfpar0GjB9T-uCQaplrD_Ab-plAniyY0_DQwD9pxEWia7mF2-upFIj0Cb8WLOwe1T3HpB8nO8VHqCLD5rQj1CzH77UA7JkYN8_gUo0bZV936fhNGPUL180ufzazwBnht1IfJX96_QThMIlxCoTT6Vx4-NWx4g8ecOzdwT9jS39P6mv5lSWdcbDA7Y7EpA91JFz40uovfOtN3ETHC0v6VyIBTi8YaUEOdZjf0bYCwrGg7AyELwpywfAguLz9Em65UT4YRtDCra3WP5fzqX4ZEC9-kgzPKMjBiU3UtBEFrXQCAn_8Z_LqT0k0VN9BJ4deo7mTQpnmcZWkg_P90YlBvReEQfaXRGCDJJUkWYCuNCFY0aPzbASEh9O_RSmEKJSUWFt6__-_kT8fCz3l0-B4hbhz7Ep_rPhnfBvW4diav-2g_9lAw5szAp6NEErHMRYkkYvkhSkQXIORWFf2u8M3C1c9jzMK0ZGmZFjGPIp7xBNecZnm-QpqlXy20vPAI9CopjJ88HvlUf3IygxGOLxyNOWsx9mD1yg9WJ-JRlvTxKEvykexaTj2f_XX2LD5bkp9OpuTjCxJjy6aMPTCMNxnz6ZiRpjFzMSNN4-j0Hma1DrMiXMdTnsbJulgVUUwZTYtVlCbTdfSae5hvmTL6yUOT276EJ3cvIzxX9DUzRGfleVTmUc7P8DycJkmUsZDmZ9V5iJzG0zjENIymrIjKKc-RccajaL1e0fRMnDPKEpqFKWM0Y_lkTSM6jeIcV3Qa5piRmGLDRT1x0ztRenMmjOnwPEyiPIvOar7C2vijvYxJvAevJYyRZHGmz31UWHUbQ2JaC2PNsRkrbO3PBN9kKUkWJLm40MjvDkciHf7eqQbdIvLnQfcnd7e4Tzcd9mD-uOP-KGuYusn2J1rD9KzT9fnTU8QbYatuNSlUQ9jSDWf4C1qtfsPCErb0RhjCloOV23P2_wAAAP__jizesw">