[LLVMdev] Bug #16941

Nadav Rotem nrotem at apple.com
Fri Oct 25 17:25:56 PDT 2013


Hi Dmitry, 

Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>.  Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type.  Why does ISPC generate illegal vector types ?  Does ISPC rely on the LLVM codegen to split the vectors to increase ILP ? In that case ISPC should generate two vectors operations. 
 
Thanks,
Nadav


On Oct 25, 2013, at 2:16 PM, Dmitry Babokin <babokin at gmail.com> wrote:

> Nadav,
> 
> The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead.
> 
> I'm attaching 3 files with vectors of length 4, 8 and 16. Try 4 on SEE4 and you'll see that both cases work well, 8 demonstrates the difference on SSE4. The same on AVX (8 vs 16).
> 
> 
> 
> 
> On Wed, Oct 23, 2013 at 1:41 AM, Nadav Rotem <nrotem at apple.com> wrote:
> 
> On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote:
> 
>> By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon.
>> 
> 
> I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks.  The problem that <8 x i1> can be legalized to <8 x i32> for YMM, or <8 x i16> for XMM.  ISPC worked around this limitation by explicitly extending the mask. The SEXT canonicalization reverted the code pattern that ISPC generated. 
> 
> Thanks,
> Nadav   
> 
> <v4.ll><v8.ll><v16.ll>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/c9df9473/attachment.html>


More information about the llvm-dev mailing list