[cfe-dev] Allow constexpr in vector extensions

Tom Westerhout via cfe-dev cfe-dev at lists.llvm.org
Tue Aug 29 17:30:15 PDT 2017


On 30/08/2017, Richard Smith <richard at metafoo.co.uk> wrote:
> On 29 August 2017 at 16:51, Tom Westerhout via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>> Anyway, could you maybe point me to an example to play around of user
>> code specifying the materialisation process?
>
> My observation was that such user code does not actually exist / work,
> because the vector operations get folded together at the IR level. That
> is: the objection to constant evaluation of vector operations in the
> frontend does not appear to be a valid objection (perhaps it once was,
> before the middle-end optimizers started optimizing vector operations,
> but not any more).

OK, so essentially that's a go on trying to implement it, right? I'll
probably take some time before I come up with a PR as I'm completely
unfamiliar with the code base.


>> > Example: on x86_64, v4si{-1, -1, -1, -1} + v4si{2, 0, 0, 0} can be
>> > emitted as four instructions (pcmpeqd, mov, movd, paddd) totalling
>> > 17 bytes, or as one movaps (7 bytes) plus a 16 byte immediate; the
>> > former is both smaller and a little faster, but LLVM is only able to
>> > produce the latter today.  LLVM is smart enough to produce good code
>> > for those two constants in isolation, but not for v4si{1, -1, -1,
>> > -1}.
>>
>> I don't quite get it. Any chance you could provide a small piece of
>> code illustrating your point?
>>
>
> Sure:
>
> v4si f() {
>     return v4si{-1,-1,-1,-1} + v4si{2,0,0,0};
> }
>
> v4si g() {
>   v4si result;
>   asm(R"(pcmpeqd %0, %0
>         movl $2, %%eax
>         movd %%eax, %%xmm1
>         paddd %%xmm1, %0)" : "=x"(result) : : "eax", "xmm1");
>   return result;
> }
>
> LLVM will materialize v4si{-1,-1,-1,-1} as pcmpeqd, and it will
> materialize {2,0,0,0} as movl + movd. But the code it produces for f()
> is larger and slower than the code for g() (which is the naive
> combination of what it did for the two constants in isolation), because
> the vector operations got folded together.

Aha, thanks, I get it now. It's interesting though that f() gets
implemented in a single movaps instruction: https://godbolt.org/g/azWbby


Tom



More information about the cfe-dev mailing list