r334208 - [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.

Martin Storsjö via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 8 00:59:02 PDT 2018


On Thu, 7 Jun 2018, Craig Topper via cfe-commits wrote:

> Author: ctopper
> Date: Thu Jun  7 10:28:03 2018
> New Revision: 334208
>
> URL: http://llvm.org/viewvc/llvm-project?rev=334208&view=rev
> Log:
> [X86] Add back builtins for  _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
>
> We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

FWIW, this change broke building libaom: 
https://bugs.chromium.org/p/aomedia/issues/detail?id=1945

In libaom, there's a macro construct like this:

  #define v256_shr_n_byte(a, n)                                                 \
    ((n) < 16                                                                   \
         ? _mm256_alignr_epi8(                                                  \
               _mm256_permute2x128_si256(a, a, _MM_SHUFFLE(2, 0, 0, 1)), a, n)  \
         : ((n) > 16                                                            \
                ? _mm256_srli_si256(                                            \
                      _mm256_permute2x128_si256(a, a, _MM_SHUFFLE(2, 0, 0, 1)), \
                      (n)-16)                                                   \
                : _mm256_permute2x128_si256(a, a, _MM_SHUFFLE(2, 0, 0, 1))))

Since this commit, the compilation errors out due to the _mm256_srli_si256 
with invalid range, even though the toplevel ternary operator won't 
actually pick them to be used. Not sure if there's anything to do from the 
clang point of view here, I guess it's a tradeoff between having stricter 
parameter checks for the intrinsics, vs the convenience of piling them up 
in a macro like this in libaom.

// Martin


More information about the cfe-commits mailing list