<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/57381>57381</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            Missed optimization for `-(foo < 0)` in Clang when `foo` was previously sign-extended and in LLVM for all cases.

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          dead-claudia

      </td>

    </tr>

</table>

<pre>

    C source: https://godbolt.org/z/aMTaGdTof (x86-64), https://godbolt.org/z/PM4hbPd3d (32-bit ARM), https://godbolt.org/z/dbaaYaG8n (AArch64)

LLVM output from above (with metadata stripped, generated from the C source): https://godbolt.org/z/6zG5dW4o1 (x86-64)

It's always safe to convert `negate(if (foo < 0) then 1 else 0)` to `signed_shift_right(foo, (bit_width of foo) - 1)` for any two's complement signed integer `foo`, regardless of width, and Clang already leverages this absent sign-extending: https://godbolt.org/z/ad3sxchKd (x86-64), https://godbolt.org/z/qTja8364h (32-bit ARM), https://godbolt.org/z/f74c8d1Y9 (AArch64). LLVM doesn't at all per https://godbolt.org/z/97zh4TEav (hand-written, targeting x86-64), and Clang consistently fails to make this optimization when the operand is previously sign-extended.

(It's odd this is implemented in Clang and not LLVM, as it's a trivial peephole optimization that could feasibly be done even in an SSA model.)

Here's the C source from the above link for quick reference:

```c

#include <limits.h>

#include <stddef.h>

#include <stdint.h>

#define SOME_CONSTANT 1000

struct foo_t {

    short baz;

};

int sub_load_mask_1(struct foo_t *foo) {

    int baz = foo->baz;

    return (SOME_CONSTANT - baz) & -(baz < 0);

}

int sub_load_mask_2(struct foo_t *foo) {

    int baz = foo->baz;

    return (SOME_CONSTANT - baz) & (baz >> 31);

}

int sub_imm_mask_1(short baz) {

    return (SOME_CONSTANT - baz) & -(baz < 0);

}

int sub_imm_mask_2(short baz) {

    return (SOME_CONSTANT - baz) & (baz >> 31);

}

int sub_reg_mask_1(int k, short baz) {

    return (k - baz) & -(baz < 0);

}

int sub_reg_mask_2(int k, short baz) {

    return (k - baz) & ((int)baz >> 31);

}

int sub_simple_mask_1(int k, short baz) {

    return k & -(baz < 0);

}

int sub_simple_mask_2(int k, short baz) {

    return k & ((int)baz >> 31);

}

int simple_mask_1(int k, short baz) {

    return -(baz < 0);

}

int simple_mask_2(int k, short baz) {

    return ((int)baz >> 31);

}

```

All the transforms seem to be valid according to alive:

- `sub_load_mask_*`: https://alive2.llvm.org/ce/z/isU8PX

- `sub_imm_mask_*`: https://alive2.llvm.org/ce/z/FjWinx

- `sub_reg_mask_*`: https://alive2.llvm.org/ce/z/uSgSMV

- `sub_simple_mask_*`: https://alive2.llvm.org/ce/z/XFxVeb

- `simple_mask_*`: https://alive2.llvm.org/ce/z/jzv4Ud

Also worth noting that ARM (especially its 32-bit variant) would benefit from this noticeably more than x86-64, as the links detail, thanks to its barrel shifter.

- ARM 32-bit:

  - `sub_load_mask_*`: 7 instructions to 3 instructions (`ldrsh` followed by the `sub_imm_mask_*` equivalent)

  - `sub_imm_mask_*`: 7 instructions to just `rsb rT, rA, #imm` + `and rD, rT, rB asr #31`

  - `sub_reg_mask_*`: 7 instructions to just `sub rT, rA, rB asr #15` + `and rD, rT, rB asr #31`

  - `sub_simple_mask_*`: 6 instructions to just `and rD, rA, rB asr #15`

  - `simple_mask_*`: 5 instructions to just `asr rD, rA asr #15`

- AArch64:

  - `sub_load_mask_*`: 7 instructions to 4 instructions

  - `sub_imm_mask_*`: 5 instructions to 4 instructions

  - `sub_reg_mask_*`: 4 instructions to 3 instructions

  - `sub_simple_mask_*`: 3 instructions to just `and rT, rA, #0x8000` + `neg rT, rA asr #15`

  - `simple_mask_*`: 2 instructions to just `sbfx rD, rA, #15, #1`

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy9WNty4jgQ_RrzooLyhYvzwANJZma3djI7tcncnijZarCCbDGSDEm-frtlIDghN1K7KUKIrD59unVaapFpcTs-Y1bXJocgmbDCuaXFD0H8EV9zLTKtXE-bOf53h7_84op_Eld6xoI4vUmH3WE_iE-C-OxFy68X_SL7KhJBlknczaRjk38uXmktMs5_8U9pRdaTicmLxnF4HoSTz5-_XzBdu2Xt2MzokvFMr4BmrqUrWAmOC-44s87I5RIEOZxDBYY7EI2FK4Dt8oC4r0jF8O7TQPzo66idCs-oef_TBfHIMq7W_NYyy2fAnGa5rlZgHAuGYQVzpIDm0udzpjULkjMWIg4xqljEQFnwAzidrPGPlfMKxNQWcuamRs4L19hSWPgJEztdS4GB4yr54RPWZdEGYqYN49Utc2vtyeW6XCoooXKswWWycjAHQ57IehgSrkGmRiiwllA9PA3zSrAzxas5BmmAi1umAIPjc7AYgMTYM7uF7sKNg0rIav4qoYnE3uTFX-LNQvt9dc3TZNgvjhLabNTPUxH9OmkLrce8yIQGW2HeHOP4UootMVEvQZ6M7or-1Qe-IsgCU9ZdG-kwF8THcTMHh0lhrSDvE4tysdLibKdu2YxLZUkGJV9Ak2G9dLKUd9xJXbE1iYa0rJEYYeCEpYGV1LVF871lANHblyoy26hVC9EA02urDS-L7UojbKWdT4inivM2QmdYYCvJKS2wLLSCNjtXYNZyXSssOuBWZkgpA0xqBQxlU5EPXrHLywkrtQDVe1BPf4AB72i_Wu_rt6l6JauFV_nvWuYLFO4MrSq_u22wUNH-lW9DT2SVq1oAFZ9Cus72iiD5cOixdULA7LnHWD77j7eT0EpimJd_X3yYnv395fJq8uWKRWEY7s_DDarOHRXtFPeH0WkzyvDHFhq3jIzfBclmNBid33_275IKrc6mSnMxLbldTLHo0zZmPNnsCC10skRsjOCcJnaR_r4rmmLA1cbvvu0Qup4UAcZD1qXtx-M0m1iL7LNM4_-Z6Y7oB3yxJHoFWVmWe1ndrcdDhv9Jona-4_f7PiJ03P3vQ6fBBRX-a3gs3hf3znH8Tsc40iDgwFujt34bPCYBi6OC3vf3trgX74v16DjfFOHR0b0xru02v09ggkc2HRUOD0eLh0SJbRlASScqHkQrrqRgPM-1oTaFRnFktXd0-Peub8Ja2xfuVuTpUV_jzeOeUqty0xBQe-m7Amm_pV9_PkDc1fkRgB-vf8jq5gHgroCOAKwv55cX3x8A7i_fEZg_P958h2wf831413er_jfRXmKr2RqlVFCX4peR2g7sAElAYJeQY4eCnQce82zTHq64kdzrCi2pPcnwdjCTbttbYCtEWDlwallKbaj3wk5l27L5Joh0Rc2HZQIvHFL57g5nLXzDRt4ybgwo5ht3ML22pohgQ2cnN8aeldoIT8PmzMTmyjtJ2iNUMMNQCWOLpvVXSq-xlctuPdnDmmOAjRNWAvh8PCJyQKGPeVzX1t9vjM2YufL3h0lzO0nQnpwE8SlNoF7SnPsJzbRTzKSheVjV29Ld835Azk97R4OW93vwaPAODocVO3ySxh7-IRpt_IPYg6exEWiLfQAVVbW5wRwtqX5r5DVyeMz2BYwDi9p_QdqvXJTk-UVpKzO8SakdvxcG3s53c966ZPHTqsxmNy09NKibDzv0Doyj4TAahFGYDDpinIiT5IR3nHQKxhfSWqzj1uWKrjxo3X3wPQLFs7u6-evh7l7P1vzp26G_5qGhv_b6Lw3w6My5Bdvr1EaNH1x6pSvqrJfrEv-hfXrzp7s0-hpy5887W4PFD4NRkkadYpwgjxGfxSINkziPcAHCNO2f5Fl_IDKIso7iGSg7DganweC8I8dxGMdhGg9DNBj0e3mapqOI48wsibLhIOiHUOK-uzsnOmbsOWT13OJDhbdoe_-QW_99B2zxee2wARkL4KKbK14LyTue89gT_hcZWLaH">