[LLVMdev] Bug #16941
Nadav Rotem
nrotem at apple.com
Fri Oct 25 17:25:56 PDT 2013
Hi Dmitry,
Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>. Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type. Why does ISPC generate illegal vector types ? Does ISPC rely on the LLVM codegen to split the vectors to increase ILP ? In that case ISPC should generate two vectors operations.
Thanks,
Nadav
On Oct 25, 2013, at 2:16 PM, Dmitry Babokin <babokin at gmail.com> wrote:
> Nadav,
>
> The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead.
>
> I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 and you'll see that both cases work well, 8 demonstrates the difference on SSE4. The same on AVX (8 vs 16).
>
>
>
>
> On Wed, Oct 23, 2013 at 1:41 AM, Nadav Rotem <nrotem at apple.com> wrote:
>
> On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote:
>
>> By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon.
>>
>
> I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks. The problem that <8 x i1> can be legalized to <8 x i32> for YMM, or <8 x i16> for XMM. ISPC worked around this limitation by explicitly extending the mask. The SEXT canonicalization reverted the code pattern that ISPC generated.
>
> Thanks,
> Nadav
>
> <v4.ll><v8.ll><v16.ll>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/c9df9473/attachment.html>
More information about the llvm-dev
mailing list