<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/143456>143456</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AVX2] SAD pattern detection is too strict
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
adworacz
</td>
</tr>
</table>
<pre>
Reference code: [Zig Godbolt](https://github.com/llvm/llvm-project/commit/6f879d9eb1a111a0c99f2a69e4ad30b220f4926a)
Some opportunities for producing optimized sum of absolute differences (SAD) calculations are being missed. It looks like [prior support for this was overly restrictive](https://github.com/llvm/llvm-project/commit/6f879d9eb1a111a0c99f2a69e4ad30b220f4926a).
Essentially, the absolute difference is being calculated, when it should just be handled by the dedicated SAD instruction.
Here's the code inline:
```zig
const block_width = 8;
const T = u8;
const VT = @Vector(block_width, T);
export fn sad(noalias srcp: [*]const u8, noalias refp: [*]const u8, height: usize, stride: usize) u32 {
const src = srcp[0..height * stride];
const ref = refp[0..height * stride];
var sum: u32 = 0;
const s: VT = src[0*stride..][0..block_width].*;
const r: VT = ref[0*stride..][0..block_width].*;
// Should work, but doesn't.
//const absdiff = @max(s,r) - @min(s,r);
//sum += @reduce(.Add, absdiff);
// Should work, but doesn't
//const VTI = @Vector(block_width, i16);
//sum += @reduce(.Add, @abs(@as(VTI, s) - @as(VTI, r)));
// Does work
const VTI = @Vector(block_width, i32);
sum += @reduce(.Add, @abs(@as(VTI, s) - @as(VTI, r)));
return sum;
}
```
Which produces:
```asm
sad:
push rbp
mov rbp, rsp
vmovq xmm0, qword ptr [rdi]
vmovq xmm1, qword ptr [rsi]
vpminub xmm2, xmm0, xmm1
vpmaxub xmm0, xmm0, xmm1
vpxor xmm1, xmm1, xmm1
vpsubb xmm0, xmm0, xmm2
vpsadbw xmm0, xmm0, xmm1
vpextrb eax, xmm0, 0
pop rbp
ret
```
But it should be:
```asm
sad:
push rbp
mov rbp, rsp
vmovq xmm0, qword ptr [rdi]
vmovq xmm1, qword ptr [rsi]
vpsadbw xmm0, xmm0, xmm1
vmovd eax, xmm0
pop rbp
ret
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vk1v4zYQ_TX0ZbCGTEmWdPBBWTftXptFWvRS8GNkcSOJWn7YTn59QUrOytm0mwBtDQOyh2-Gb94MqWHWqsOAuCP5Dcn3K-Zdq82OyZM2TDytuJaPu1-xQYODQBBaIklrIPnNH-oAP2vJdedIvie0bJ0bLUlrQm8JvT0o13q-Fron9LbrjpfHh9HoLygcobdC970KP7ZNWVSyQr5hm82GJaKqGsq2FWZMpgmnNGmyim4ZoRVJapLUd7pH0OOojfODcgotNNrAaLT0Qg0H0KNTvXpCCdb3oBtg3OrOOwSpmjkZC4SWd_We0AoE64TvmFN6sMAMAscQplfWolzDJwed1g8WOvWAIfnRKG3A-kgh7u1aZeHELOgjmu4RDFpnlHDqiP-jPOtJn5-sxcEp1nWPhH4E1-JrAoCyc56X9FEG-KnFAZQD22rfSfjirQOO0LJBdiiBP8aAEqUSwQXu6j2owTrjRRBw5vALGiS0sBEc-gbU0KkhtM8EINtk-j6pA0lqoYewT6fFw58nJV0LJN1DSdKb58XP0eSvbPeTkWTJPQqnDaHlIkZI53Nom-hBkhrPU8UGsEwSWg6adYpZsEaMc2MTWpN8P0X3ZYhwARls_h7Uojq0Lix7q54wmEIHTOdlNlXgUwqkCGQAACZ_a0TMIXLIb5L1eooFhNaXGPl-SuGbl8EmekVSb_CaH0cW-raPpAKXdA_JszoLTgEwa2uNCBsQWk9h1-sQOG65lDrfr4MqL2kuAhls3hvoQj4eGribOvKkzUPQl3sHUqMdCC1caLsZN-3MuA29fmmPnp0JLS2hH00oxIdoU8PCttAqhglXB6E3s79B6QUSWq5rGY_JHP_KcSH0GyhfIS_t_OlH_aw22_eTJVnCuCW0DD_C8_7zp9iiz1osrVGN6ftKFfYa7ZTQstBvYZ7SK-b_KedLKQw6b4ap52MuxX55-UzZ_dYq0c7vD7TfX1HM9iSpw5WR1otSj9624Wn4GKy9PsLlb2BkxwX22OvjVwA4930SVr-etJEwOhOuEyNVOAov0ee-33yPtS-xY68GzwOaBvRlh-h9BWPnCZb8I-yszcRz87z8Gsx6zuG1aPQaxiQ__XhTPDvDAcMh_YYK1YFRj7AQef4YdN-X8ca7xYuLv_KueVnIFxWcP_96IeHNpXybXL0-SoAruRZd-RbBVnKXyiqt2Ap3myKr8k2SZcWq3ckqxZyhwCIVWVaIXBYVF1UjSsmavExXakcTmifbpKJpUtBizYum2pY040nZNGWzJVmCPVPdOgw0a20OK2Wtx90mS7N8u-oYx87GgZPSAU8QVwmlYf40uzgFcX-wJEs6ZZ39FsYp18VJtb7_PaDj5DEy59AMINFhnD_CWOO0hmn-WnnT7d49e0VKltDbmfNxR_8KAAD__6BPQTQ">