[cfe-dev] Allow constexpr in vector extensions

Tue Aug 29 17:38:30 PDT 2017

On 29 August 2017 at 17:30, Tom Westerhout via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 30/08/2017, Richard Smith <richard at metafoo.co.uk> wrote:
> > On 29 August 2017 at 16:51, Tom Westerhout via cfe-dev
> > <cfe-dev at lists.llvm.org> wrote:
> >> Anyway, could you maybe point me to an example to play around of user
> >> code specifying the materialisation process?
> >
> > My observation was that such user code does not actually exist / work,
> > because the vector operations get folded together at the IR level. That
> > is: the objection to constant evaluation of vector operations in the
> > frontend does not appear to be a valid objection (perhaps it once was,
> > before the middle-end optimizers started optimizing vector operations,
> > but not any more).
>
> OK, so essentially that's a go on trying to implement it, right? I'll
> probably take some time before I come up with a PR as I'm completely
> unfamiliar with the code base.

Yes, please go for it :)

>> > Example: on x86_64, v4si{-1, -1, -1, -1} + v4si{2, 0, 0, 0} can be
> >> > emitted as four instructions (pcmpeqd, mov, movd, paddd) totalling
> >> > 17 bytes, or as one movaps (7 bytes) plus a 16 byte immediate; the
> >> > former is both smaller and a little faster, but LLVM is only able to
> >> > produce the latter today.  LLVM is smart enough to produce good code
> >> > for those two constants in isolation, but not for v4si{1, -1, -1,
> >> > -1}.
> >>
> >> I don't quite get it. Any chance you could provide a small piece of
> >> code illustrating your point?
> >>
> >
> > Sure:
> >
> > v4si f() {
> >     return v4si{-1,-1,-1,-1} + v4si{2,0,0,0};
> > }
> >
> > v4si g() {
> >   v4si result;
> >   asm(R"(pcmpeqd %0, %0
> >         movl $2, %%eax
> >         movd %%eax, %%xmm1
> >         paddd %%xmm1, %0)" : "=x"(result) : : "eax", "xmm1");
> >   return result;
> > }
> >
> > LLVM will materialize v4si{-1,-1,-1,-1} as pcmpeqd, and it will
> > materialize {2,0,0,0} as movl + movd. But the code it produces for f()
> > is larger and slower than the code for g() (which is the naive
> > combination of what it did for the two constants in isolation), because
> > the vector operations got folded together.
>
> Aha, thanks, I get it now. It's interesting though that f() gets
> implemented in a single movaps instruction: https://godbolt.org/g/azWbby
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170829/9184cb7a/attachment.html>