[cfe-dev] Allow constexpr in vector extensions

Tue Aug 29 16:51:18 PDT 2017

On 30/08/2017, Richard Smith <richard at metafoo.co.uk> wrote:
> On 29 August 2017 at 05:45, Tom Westerhout via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> [snip]
>>
>> It compiles cleanly with both GCC and Clang. However, if I try to make
>> c constexpr, Clang tells me that operator+ on vectors is not
>> constexpr. I'm wondering if there's a reason for that. There are no
>> constrains from the Standard as these are intrinsics, so I see no
>> reason why we couldn't allow constexpr code benefit from SIMD.
>
>
> We could certainly allow this. However, last time this was discussed,
> an objection was raised: materializing arbitrary vector constants is
> not cheap on all targets, and in some cases user code is written in
> such a way as to describe how a particular constant should be generated
> (eg, start with one easy-to-lower constant, shift by N, add another
> easy-to-lower constant). If we constant-fold arbitrary vector
> operations, that sequence of operations will be lost in many cases,
> requiring backend heroics to infer how to materialize the constant.
>
> I don't know to what extent the above is actually a valid objection,
> though: in my testing, LLVM itself will fold together the operations
> and in so doing lose the instructions on how to materialize the
> constant. (And in some cases, Clang's IR generation code will do the
> same, because it does IR-level constant folding as it goes.)

I guess it's a stupid question and I'm sorry for that, but I'm very new
to all this, so could you maybe explain a bit what you mean by
"materializing vector constants"? Does this just mean creating a vector
constant in memory? If it does, then my first guess would be to use
inline asm since we're talking about some specific target where user
"knows better".

Anyway, could you maybe point me to an example to play around of user
code specifying the materialisation process?

> Example: on x86_64, v4si{-1, -1, -1, -1} + v4si{2, 0, 0, 0} can be
> emitted as four instructions (pcmpeqd, mov, movd, paddd) totalling 17
> bytes, or as one movaps (7 bytes) plus a 16 byte immediate; the former
> is both smaller and a little faster, but LLVM is only able to produce
> the latter today.  LLVM is smart enough to produce good code for those
> two constants in isolation, but not for v4si{1, -1, -1, -1}.

I don't quite get it. Any chance you could provide a small piece of code
illustrating your point?

Tom