[cfe-dev] Unexpected x86/AVX cmpXY builtin codegen

Fri Mar 25 23:39:06 PDT 2011

I am investigating the unexptected x86/AVX cmpXY builtin codegen in clang.
Actually, this is a problem of how clang handles I-C-E expression.

<<Problem definition>>
AVX's cmpXY instruction, for example _mm256_cmp_ps() in C intrinsic
function, allows only immediate value(or constant integer value) for
third args.

(code example)
#define _CMP_GE_OS 0x0d
_mm256_cmp_ps(a, b, _CMP_GE_OS)

but its definition in clang's avxintrin.h is using static inline
function definition.

static __inline __m256 __attribute__((__always_inline__, __nodebug__))
_mm256_cmp_ps(__m256 a, __m256 b, const int c)
{
  return (__m256)__builtin_ia32_cmpps256((__v8sf)a, (__v8sf)b, c);
}

clang's constant integer folder and ICE(Integer-Constant-Expression)
engine cannot detect constant integer value expresion over function
boundary,
thus in this case clang emits scalar expression for third argument
instead of constant(immediate value) expression.

  ...
  %tmp2.i = load i32* %c.addr.i, align 4
  %conv.i = trunc i32 %tmp2.i to i8
  %0 = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %tmp.i,
<8 x float> %tmp1.i, i8 %conv.i) nounwind

With this, llc failed to emit assembly since x86/AVX backend in LLVM
expects third argument is immediate value.

Expected codegen is as follows

  %0 = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %tmp, <8 x float> %
tmp1, i8 13)
  (third element is expanded to immediate value)

<<Solution proposed>>
There would be two solution to fix this problem.

1) Rewrite cmpXY inline function as macro

This is the easiest solution for clang, but might lose a compatibility
with avxintrin.h provided by other parties(e.g. gcc).

2) Extend constant integer folding

Extend CheckICE() and Evaluate() in lib/AST/ExprConstant.cpp so that
it can correctly handle constant integer expression over function
boundary.

<<Action>>
It is easy for me to provide a patch for 1) solution. But I'm not sure
how much clang guys want to maintain the compatibility with gcc or
other parties's avxintrin.h.
If clang guys want to maintain the compatibility as much as possible,
solution would be 2), it might require large modification of I-C-E
engine.

--
Syoyo