[cfe-dev] Allow constexpr in vector extensions

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Tue Aug 29 16:15:21 PDT 2017


On 29 August 2017 at 05:45, Tom Westerhout via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Greetings!
>
>
> This is my first post on this ML so, please, do tell me if I'm doing it
> wrong.
>
> I've noticed the following difference between GCC and Clang. Consider
> this piece of code
>
> ```
> typedef int v4si __attribute__ ((vector_size (16)));
>
> int main()
> {
>     constexpr v4si a = {1,2,3,4};
>     constexpr v4si b = {2,0,0,0};
>     v4si c = a + b;
> }
> ```
>
> It compiles cleanly with both GCC and Clang. However, if I try to make c
> constexpr, Clang tells me that operator+ on vectors is not constexpr. I'm
> wondering if there's a reason for that. There are no constrains from the
> Standard as these are intrinsics, so I see no reason why we couldn't
> allow constexpr code benefit from SIMD.


We could certainly allow this. However, last time this was discussed, an
objection was raised: materializing arbitrary vector constants is not cheap
on all targets, and in some cases user code is written in such a way as to
describe how a particular constant should be generated (eg, start with one
easy-to-lower constant, shift by N, add another easy-to-lower constant). If
we constant-fold arbitrary vector operations, that sequence of operations
will be lost in many cases, requiring backend heroics to infer how to
materialize the constant.

I don't know to what extent the above is actually a valid objection,
though: in my testing, LLVM itself will fold together the operations and in
so doing lose the instructions on how to materialize the constant. (And in
some cases, Clang's IR generation code will do the same, because it does
IR-level constant folding as it goes.)

Example: on x86_64, v4si{-1, -1, -1, -1} + v4si{2, 0, 0, 0} can be emitted
as four instructions (pcmpeqd, mov, movd, paddd) totalling 17 bytes, or as
one movaps (7 bytes) plus a 16 byte immediate; the former is both smaller
and a little faster, but LLVM is only able to produce the latter today.
LLVM is smart enough to produce good code for those two constants in
isolation, but not for v4si{1, -1, -1, -1}.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170829/ed003440/attachment.html>


More information about the cfe-dev mailing list