[clang] [Clang] Allow VDBPSADBW intrinsics in constexpr (PR #188887)
Pierluigi Lenoci via cfe-commits
cfe-commits at lists.llvm.org
Sun Mar 29 10:17:39 PDT 2026
pierluigilenoci wrote:
@RKSimon Thank you for testing on actual hardware — you're right, the test values are wrong. My VDBPSADBW algorithm implementation is incorrect.
After reviewing the GCC reference implementation (`gcc/testsuite/gcc.target/i386/avx512bw-vdbpsadbw-2.c`), I can see the algorithm has two distinct phases:
1. **Shuffle phase**: Uses all four 2-bit fields of imm8 to shuffle src2 into a temp buffer (my code only used bits[1:0] and bits[3:2])
2. **SAD phase**: Uses a sliding/overlapping comparison pattern, not simple aligned block-vs-block SAD
I'll rework the implementation to match the correct algorithm and update all test values. Sorry for the incorrect numbers — I should have verified against hardware or the reference implementation before pushing.
I'll also incorporate @tbaederr's suggestions (which I believe are already applied in the latest push).
Will update the PR shortly.
https://github.com/llvm/llvm-project/pull/188887
More information about the cfe-commits
mailing list