[clang] [Clang] Allow VDBPSADBW intrinsics in constexpr (PR #188887)

Pierluigi Lenoci via cfe-commits cfe-commits at lists.llvm.org
Sun Mar 29 10:17:39 PDT 2026


pierluigilenoci wrote:

@RKSimon Thank you for testing on actual hardware — you're right, the test values are wrong. My VDBPSADBW algorithm implementation is incorrect.

After reviewing the GCC reference implementation (`gcc/testsuite/gcc.target/i386/avx512bw-vdbpsadbw-2.c`), I can see the algorithm has two distinct phases:

1. **Shuffle phase**: Uses all four 2-bit fields of imm8 to shuffle src2 into a temp buffer (my code only used bits[1:0] and bits[3:2])
2. **SAD phase**: Uses a sliding/overlapping comparison pattern, not simple aligned block-vs-block SAD

I'll rework the implementation to match the correct algorithm and update all test values. Sorry for the incorrect numbers — I should have verified against hardware or the reference implementation before pushing.

I'll also incorporate @tbaederr's suggestions (which I believe are already applied in the latest push).

Will update the PR shortly.

https://github.com/llvm/llvm-project/pull/188887


More information about the cfe-commits mailing list