<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/143456>143456</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AVX2] SAD pattern detection is too strict
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          adworacz
      </td>
    </tr>
</table>

<pre>
    Reference code: [Zig Godbolt](https://github.com/llvm/llvm-project/commit/6f879d9eb1a111a0c99f2a69e4ad30b220f4926a)

Some opportunities for producing optimized sum of absolute differences (SAD) calculations are being missed. It looks like [prior support for this was overly restrictive](https://github.com/llvm/llvm-project/commit/6f879d9eb1a111a0c99f2a69e4ad30b220f4926a).

Essentially, the absolute difference is being calculated, when it should just be handled by the dedicated SAD instruction.

Here's the code inline:

```zig
const block_width = 8;
const T = u8;
const VT = @Vector(block_width, T);

export fn sad(noalias srcp: [*]const u8, noalias refp: [*]const u8, height: usize, stride: usize) u32 {
    const src = srcp[0..height * stride];
    const ref = refp[0..height * stride];
    
    var sum: u32 = 0;

    const s: VT = src[0*stride..][0..block_width].*;
    const r: VT = ref[0*stride..][0..block_width].*;

    // Should work, but doesn't.
 //const absdiff = @max(s,r) - @min(s,r);
    //sum += @reduce(.Add, absdiff);
        
    // Should work, but doesn't
    //const VTI = @Vector(block_width, i16);
    //sum += @reduce(.Add, @abs(@as(VTI, s) - @as(VTI, r)));

    // Does work
    const VTI = @Vector(block_width, i32);
    sum += @reduce(.Add, @abs(@as(VTI, s) - @as(VTI, r)));
    
 return sum;
}
```

Which produces:

```asm
sad:
        push    rbp
 mov     rbp, rsp
        vmovq   xmm0, qword ptr [rdi]
        vmovq xmm1, qword ptr [rsi]
        vpminub xmm2, xmm0, xmm1
        vpmaxub xmm0, xmm0, xmm1
        vpxor   xmm1, xmm1, xmm1
        vpsubb  xmm0, xmm0, xmm2
        vpsadbw xmm0, xmm0, xmm1
        vpextrb eax, xmm0, 0
 pop     rbp
        ret
```

But it should be:

```asm
sad:
 push    rbp
        mov     rbp, rsp
        vmovq   xmm0, qword ptr [rdi]
        vmovq   xmm1, qword ptr [rsi]
        vpsadbw xmm0, xmm0, xmm1
        vmovd   eax, xmm0
        pop     rbp
        ret
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vk1v4zYQ_TX0ZbCGTEmWdPBBWTftXptFWvRS8GNkcSOJWn7YTn59QUrOytm0mwBtDQOyh2-Gb94MqWHWqsOAuCP5Dcn3K-Zdq82OyZM2TDytuJaPu1-xQYODQBBaIklrIPnNH-oAP2vJdedIvie0bJ0bLUlrQm8JvT0o13q-Fron9LbrjpfHh9HoLygcobdC970KP7ZNWVSyQr5hm82GJaKqGsq2FWZMpgmnNGmyim4ZoRVJapLUd7pH0OOojfODcgotNNrAaLT0Qg0H0KNTvXpCCdb3oBtg3OrOOwSpmjkZC4SWd_We0AoE64TvmFN6sMAMAscQplfWolzDJwed1g8WOvWAIfnRKG3A-kgh7u1aZeHELOgjmu4RDFpnlHDqiP-jPOtJn5-sxcEp1nWPhH4E1-JrAoCyc56X9FEG-KnFAZQD22rfSfjirQOO0LJBdiiBP8aAEqUSwQXu6j2owTrjRRBw5vALGiS0sBEc-gbU0KkhtM8EINtk-j6pA0lqoYewT6fFw58nJV0LJN1DSdKb58XP0eSvbPeTkWTJPQqnDaHlIkZI53Nom-hBkhrPU8UGsEwSWg6adYpZsEaMc2MTWpN8P0X3ZYhwARls_h7Uojq0Lix7q54wmEIHTOdlNlXgUwqkCGQAACZ_a0TMIXLIb5L1eooFhNaXGPl-SuGbl8EmekVSb_CaH0cW-raPpAKXdA_JszoLTgEwa2uNCBsQWk9h1-sQOG65lDrfr4MqL2kuAhls3hvoQj4eGribOvKkzUPQl3sHUqMdCC1caLsZN-3MuA29fmmPnp0JLS2hH00oxIdoU8PCttAqhglXB6E3s79B6QUSWq5rGY_JHP_KcSH0GyhfIS_t_OlH_aw22_eTJVnCuCW0DD_C8_7zp9iiz1osrVGN6ftKFfYa7ZTQstBvYZ7SK-b_KedLKQw6b4ap52MuxX55-UzZ_dYq0c7vD7TfX1HM9iSpw5WR1otSj9624Wn4GKy9PsLlb2BkxwX22OvjVwA4930SVr-etJEwOhOuEyNVOAov0ee-33yPtS-xY68GzwOaBvRlh-h9BWPnCZb8I-yszcRz87z8Gsx6zuG1aPQaxiQ__XhTPDvDAcMh_YYK1YFRj7AQef4YdN-X8ca7xYuLv_KueVnIFxWcP_96IeHNpXybXL0-SoAruRZd-RbBVnKXyiqt2Ap3myKr8k2SZcWq3ckqxZyhwCIVWVaIXBYVF1UjSsmavExXakcTmifbpKJpUtBizYum2pY040nZNGWzJVmCPVPdOgw0a20OK2Wtx90mS7N8u-oYx87GgZPSAU8QVwmlYf40uzgFcX-wJEs6ZZ39FsYp18VJtb7_PaDj5DEy59AMINFhnD_CWOO0hmn-WnnT7d49e0VKltDbmfNxR_8KAAD__6BPQTQ">