[PATCH] [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.

Andrea Di Biagio andrea.dibiagio at gmail.com
Wed Dec 10 04:06:47 PST 2014


Hi David and Quentin,

thanks for the reviews.
For some reasons I am currently unable to log on Phabricator. Please
find my answer to your questions below.

On Wed, Dec 10, 2014 at 2:08 AM, Quentin Colombet <qcolombet at apple.com> wrote:
> ================
> Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:784
> @@ +783,3 @@
> +        // results are undefined.
> +        if ((Index + Length) > 64)
> +          return ReplaceInstUsesWith(CI, UndefValue::get(II->getType()));
> ----------------
> majnemer wrote:
>> Is there anything that prevents `Index + Length` from wrapping around? If not, what happens?
> I think there is nothing that would prevent that, but in that case we fall into this condition:
> "If the sum of the bit index + length field is greater than 64, the results are undefined.”
>
> So I’d say nothing to be done here but maybe I misunderstood something :).
>
Hi David,

My understanding is that x86 builtin '__builtin_ia32_insertqi' forces
'Index' and 'Length' to be 8-bit quantities.
In Clang we have (BuiltinsX86.def)  BUILTIN(__builtin_ia32_insertqi,
"V2LLiV2LLiV2LLiIcIc");
Clang definition of intrinsic _mm_inserti_si64 (in ammintrin.h)
explicitly casts 'len' and 'idx' to 'char' before calling
__builtin_ia32_insertqi.
GCC also forces __builtin_ia32_insertqi to use 8-bit immediate values
for 'len' and 'idx'.

If for example I write:

  __m128i foo(__m128i A, __m128i B) {
    return __builtin_ia32_insertqi((__v2di) A, (__v2di) B, 255, 255);
  }

with this patch, the call to builtin __builtin_ia32_insertqi gets
folded to Undef. That's because of the rule that Quentin mentioned in
his last post.

If I instead write:

  __m128i foo(__m128i A, __m128i B) {
    return __builtin_ia32_insertqi((__v2di) A, (__v2di) B, 255, 256);
  }

The 256 number would be truncated to 0 (Clang would also raise a
warning for this "int to char" conversion).
gcc would raise instead an error: "the last argument must be an 8-bit
immediate".

Back to my code: it is true that I don't check if `Index + Length` may
wrap. However, CIStart and CIWidth are expected to be 8-bit
quantities. So, unless I forgot something important, the wrapping
shouldn't ever occur. Both Index and Length are obtained from
zero-extending CIStart and Length (two 8-bit quantities) to
'unsigned'.

If you want, I can add an assert to check that both Index and Length
are never bigger than 255.

@Quentin
I will add a "Per AMD64 manual" in the comment.

Thanks!
Andrea

> http://reviews.llvm.org/D6583
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list