[cfe-dev] Allow constexpr in vector extensions

Tue Aug 29 17:08:10 PDT 2017

On 29 August 2017 at 16:51, Tom Westerhout via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 30/08/2017, Richard Smith <richard at metafoo.co.uk> wrote:
> > On 29 August 2017 at 05:45, Tom Westerhout via cfe-dev <
> > cfe-dev at lists.llvm.org> wrote:
> >
> >> [snip]
> >>
> >> It compiles cleanly with both GCC and Clang. However, if I try to make
> >> c constexpr, Clang tells me that operator+ on vectors is not
> >> constexpr. I'm wondering if there's a reason for that. There are no
> >> constrains from the Standard as these are intrinsics, so I see no
> >> reason why we couldn't allow constexpr code benefit from SIMD.
> >
> >
> > We could certainly allow this. However, last time this was discussed,
> > an objection was raised: materializing arbitrary vector constants is
> > not cheap on all targets, and in some cases user code is written in
> > such a way as to describe how a particular constant should be generated
> > (eg, start with one easy-to-lower constant, shift by N, add another
> > easy-to-lower constant). If we constant-fold arbitrary vector
> > operations, that sequence of operations will be lost in many cases,
> > requiring backend heroics to infer how to materialize the constant.
> >
> > I don't know to what extent the above is actually a valid objection,
> > though: in my testing, LLVM itself will fold together the operations
> > and in so doing lose the instructions on how to materialize the
> > constant. (And in some cases, Clang's IR generation code will do the
> > same, because it does IR-level constant folding as it goes.)
>
> I guess it's a stupid question and I'm sorry for that, but I'm very new
> to all this, so could you maybe explain a bit what you mean by
> "materializing vector constants"? Does this just mean creating a vector
> constant in memory?

No, it means creating a vector constant in a vector register (ideally
without loading it from somewhere in memory, since that tends to be slow
and have a large code size).

If it does, then my first guess would be to use
> inline asm since we're talking about some specific target where user
> "knows better".
>

I generally agree that the user should have to explicitly express that they
want their operation sequence to be preserved, either via inline asm or
some other mechanism we provide them.

Anyway, could you maybe point me to an example to play around of user
> code specifying the materialisation process?

My observation was that such user code does not actually exist / work,
because the vector operations get folded together at the IR level. That is:
the objection to constant evaluation of vector operations in the frontend
does not appear to be a valid objection (perhaps it once was, before the
middle-end optimizers started optimizing vector operations, but not any
more).

> Example: on x86_64, v4si{-1, -1, -1, -1} + v4si{2, 0, 0, 0} can be
> > emitted as four instructions (pcmpeqd, mov, movd, paddd) totalling 17
> > bytes, or as one movaps (7 bytes) plus a 16 byte immediate; the former
> > is both smaller and a little faster, but LLVM is only able to produce
> > the latter today.  LLVM is smart enough to produce good code for those
> > two constants in isolation, but not for v4si{1, -1, -1, -1}.
>
> I don't quite get it. Any chance you could provide a small piece of code
> illustrating your point?
>

Sure:

v4si f() {
    return v4si{-1,-1,-1,-1} + v4si{2,0,0,0};
}

v4si g() {
  v4si result;
  asm(R"(pcmpeqd %0, %0
        movl $2, %%eax
        movd %%eax, %%xmm1
        paddd %%xmm1, %0)" : "=x"(result) : : "eax", "xmm1");
  return result;
}

LLVM will materialize v4si{-1,-1,-1,-1} as pcmpeqd, and it will materialize
{2,0,0,0} as movl + movd. But the code it produces for f() is larger and
slower than the code for g() (which is the naive combination of what it did
for the two constants in isolation), because the vector operations got
folded together.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170829/b9b4ca55/attachment.html>