<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/87398>87398</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [WebAssembly] Inlining of i32.load8_u can produce unnecessary 'i32.const 255; i32.and'
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          kg
      </td>
    </tr>
</table>

<pre>
    Convenience godbolt link: https://gcc.godbolt.org/z/dohjceEnz

I've been writing scalar fallback for a SIMD algorithm, and I noticed that the trunk version of wasm clang on godbolt is generating unnecessarily complex code for the fallback, but only does so if the fallback function is inlined. Specifically, if the function isn't inlined it will generate raw `i32.load8_u` operations, but once it's inlined all the loads have turned into `i32.load8_u; i32.const 255; i32.and`. Ideally whatever translates the wasm into native code for the target will optimize these out but it seems like they shouldn't be there.

Repro code (C, only compile switch I used is `-O3`):
```c
#include <stdint.h>

#define UNROLLED 1
#define FORCE_INLINING 1

#define DN_FORCEINLINE(RET_TYPE) inline RET_TYPE __attribute__((always_inline))

typedef uint8_t dn_u8x16 __attribute__ ((vector_size (16), aligned(16)));
typedef union {
    dn_u8x16 vec;
    uint8_t values[16];
} dn_simdhash_suffixes;

static DN_FORCEINLINE(int)
ctz (uint32_t value)
{
    // __builtin_ctz is undefined for 0
    if (value == 0)
        return 32;
    return __builtin_ctz(value);
}

#if FORCE_INLINING
static DN_FORCEINLINE(int)
#else
int
#endif
find_first_matching_suffix_scalar (uint8_t needle[16], uint8_t haystack[16], uint32_t count)
{
#define FIND_IN_LANE(lane) \
    if (needle[lane] == haystack[lane]) \
 return lane;

#define FIND_IN_BLOCK(base) \
    FIND_IN_LANE((base + 0)); \
    FIND_IN_LANE((base + 1)); \
    FIND_IN_LANE((base + 2)); \
    FIND_IN_LANE((base + 3));

#if UNROLLED
 FIND_IN_BLOCK(0);
    FIND_IN_BLOCK(4);
    FIND_IN_BLOCK(8);
 FIND_IN_LANE(12);
    FIND_IN_LANE(13);
#else
    for (uint32_t i = 0; i < count; i++)
        if (needle[i] == haystack[i])
 return i;
#endif

    return 32;
}

static DN_FORCEINLINE(int)
find_first_matching_suffix (dn_simdhash_suffixes needle, dn_simdhash_suffixes haystack, uint32_t count)
{
    return find_first_matching_suffix_scalar(needle.values, haystack.values, count);
}

typedef struct bucket_t {
    dn_simdhash_suffixes suffixes;
    uint32_t keys[12];
}
__attribute__((__aligned__(16))) bucket_t;

int
scan_bucket (bucket_t *bucket, uint32_t needle, dn_simdhash_suffixes search_vector)
{
    uint32_t count = bucket->suffixes.values[14],
        index = find_first_matching_suffix (search_vector, bucket->suffixes, count);
    uint32_t *key = &bucket->keys[index];

    for (; index < count; index++, key++) {
 if (needle == *key)
            return index;
    }

 return -1;
}
```

If `UNROLLED` and `FORCE_INLINING` are both `1`, you get generated code like this:
```wasm
        i32.load8_u 2
        i32.const       255
        i32.and 
        local.get 0
        i32.load8_u     2
        i32.const       255
 i32.and 
        i32.eq  
        br_if           0 # 0: down to label0
```

But if you set `FORCE_INLINING` to `0` and check `find_first_matching_suffix_scalar`:
```wasm
 i32.load8_u     2
        local.get       1
        i32.load8_u 2
        i32.eq  
        br_if           0 # 0: down to label0
```

The loop version (set `UNROLLED` to `0`) seems immune to this even when inlined, which is interesting.

Sorry for the bloat in the repro, it was already tricky to reduce it this far. Also, sorry if latest-latest clang already has a fix for this - I didn't see any open/closed issues here on github that looked similar.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0WFtv4j4W_zTm5agoccrtgYcW2hXa2c5qZlarfYoc54T4j7FZ24FhPv3KzoUQ6LTzsAi1xD4-t9-5-IRZK7YKcUkmz2SyHrHKldosd9tRpvPzcqXVEZVAxRG2Os-0dCCF2pHkCUrnDpYkT4S-Evq65XzcUIy12RL6-ovQ11yXf3F8Ub9ItCbRU_13Q-jsiJAhKjgZ4YTaguVMMgMFkzJjfAeFNsDg--Yfa2Byq41w5Z7QFTCVwwaUdoJjDq5kDlyJ4EyldnBEY4VWoAs4MbsHLpnaglad6sLCFhUaFmRWSiFHa5kR8gxc7w8SfwLXOQbxnm-rjxedVQ60kmfINVqwGkRxRQNFpbjz8oUFoaRQmI_h-wG5KARnUp49l_bQhVYROnPtARAOTkLKVk8Ew05AppFI6Fhqls_Tikwj0IdghVb2ohpHEI7QWScdmJRBmj9ooWRHBFeZIEY5PWSbPIN_5FpZB3QyaReYysk0GsMmR28FnErm8IgGnGHKSubQBinB54GxYk4c8dqVjpktNsbpgxN78Qv9hkXQlQsmCAcWcW9Bil3YO4MtdSXz2kVZWDM47gfTNzwYXUsidL7y3gggeTiFRLAn4XgJG6isN9t6ox--JmQaEbrw0Vvzmkb1lzfPNBGKy8pzTVbW5UK5cUmSl75oQpMcC6EQ_vX27euXLy9riIc7r1-_rV7SzduXzdvm7W_d_oBq_ZYGwkD3Quj828uP9Md__vlC6KIBE9olSFPmnBFZ5TBNCZ0TOmfyxM42rSm9XXTRF-TOB8yxgEooN08d5Cqt5j_j6TUrqHkdkTttUuvhIXQeTwO_FTDpC0V-WQrf5HkgQvmgJrNmGQAu0o7IO3q_0apzZLJCSybP8ZRM1h0Jma39WSv2eclsmdqqKMRPtBeC8Nc65gS_daFQrnMDd7-8LV5gQluJFyf1ta2rGaRpVgnphEr9WWGhUjVWeQjo6HJAFJ514AgkWZNkDVHHGpqPQZ93kNArBzSrV7JaXn3nktl6EDeiGETW531BaILSYv3gN7pVlYuifiqEytNCGOvSPXO8FGrbuD9tCnXjTQ-fQswldvDRVQdsyc7W-eo52AsYcF31tWox6KXO5m2dbt7SL0_BBslCaAOZrIbO7zQINJN1i0NPfrNzzaDxf9i7DqpbLZ6_fF39ndB5xuytGgNVGzIg9LmOhYDlp4_Ef36E_vmRZJDB_dhq61nDaOiDqH-uL6YlePyIYH5FMNAypu8dbwmSK8X7Ae2pfYL2s11AnZS-nfly3oSefyT0OXwH6ToIK3E_pkQdUNfBJK70uqTUTdb3asEgvz-Txe9nqNf8Xtls85Su7lbVi2GfSNKeGR-Wis6R46bO01Unq7fUSXrHKW2Dsc5U3F8X-A5d6m5aza1hN32j7T3Bwh2eQ-uhg9ZT_7jTa9O06YRhodcLO50GKdXVWMuZSmsiD9LFBPpU_75y_QdwWWSGl2ndre8jdI1iSIJazgNJXlpG40v3faxL9CAVVI4_w9nfh9xAn9UdUXdhvlKU0KcdnoM0QqcXDg1IQZcrnIY5H5K60bif5-Fgk-srj3mX-L0A6md9m-61RjcFopcANe--OYPIbQkf4jux3d47r-ajwl9Suxo8jcLUQ6bRoOX7DYOQaVf63TjcaVdw1hX4m3Y7QeT17bi5Uwt7e-n1F_cB6pexAOjtVj0i1B8_KNwQBIWvVqXmTI69XtH7ogK_z4q7L8ev4n9hsJqZVBQ97CIgNPEt4QlyfVLgNEiWoYx-A8uzn1CK4F7rM_geHvVQFbWY8RL5zq98XCWn0W9x-cBLF-_Wn_iP4Pz_-OtHGDz1oRvLQ5Vww9i-uMxnYz3_if2-Uui3fMACHlHBqUTVDrY-zE-l4GU9ajs0aP1IfzUZftfGnLvxM5Oa-Sk7PBg_NIZp3PmpFZg0yPIzOCP47uzlGsyrME_XGhTMjOFJ2nDIBsaigDD6uof6X_O2oWVVerbgi2OtgbDwABvIRTPMWkRg6uwHeUXoK5e6nk9t5XsxGgzvLYQrq6x-zSG13mEOVuyFZGY8ypdJvkgWbITLeBbTaP5I48WoXObJYsaihLF4PotjlkwYizldMPrIkU_ibCSWNKKP0WNE4_lkOlmMHyOKebaYLTIaR_EkIo8R7pmQYymP-7E221FQazmfJYv5KOBuw_siShWeap0J9Q10ZJb-zENWbS15jKSwzl64OOFkeNH0b8yerMV9Js_-ZrXxoAq1BV1cRSpnCg5GByAu72rOQOjs3fcUdDaqjFwO3k0FN4653hP66tVp_j0cjP4LuSP0tXY8oa_ByP8FAAD__1yeoAk">