[PATCH] D32416: [x86, SSE] AVX1 PR28129

Tue Apr 25 03:23:08 PDT 2017

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86InstrSSE.td:7754-7755

 // Without AVX2 we need to concat two v4i32 V_SETALLONES to create a 256-bit
 // all ones value.
 let Predicates = [HasAVX1Only] in
----------------
spatel wrote:
> This comment should be updated to match the new code.
> 
> Is it correct that this pattern won't apply to most integer code for an AVX target because that would already be legalized to v4i32/v2i64? If that's true, I think it's also worth mentioning here.
> 
> I'm imagining cases like this:
>   define <8 x i32> @cmpeq_v8i32(<8 x i32> %a) nounwind {
>    %cmp = icmp eq <8 x i32> %a, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
>     %res = sext <8 x i1> %cmp to <8 x i32>
>    ret <8 x i32> %res
>   }
> 
>   define <8 x i32> @cmpne_v8i32(<8 x i32> %a) nounwind {
>     %cmp = icmp ne <8 x i32> %a, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
>     %res = sext <8 x i1> %cmp to <8 x i32>
>     ret <8 x i32> %res
>   }
> 
>   define <8 x i32> @sub1_v8i32(<8 x i32> %a) nounwind {
>     %add = add <8 x i32> %a, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
>     ret <8 x i32> %add
>   }
> 
> This comment should be updated to match the new code.

+1

================
Comment at: test/CodeGen/X86/vector-pcmp.ll:156-158
+; AVX1-NEXT:    vxorps %ymm1, %ymm1, %ymm1
+; AVX1-NEXT:    vcmptrueps %ymm1, %ymm1, %ymm1
 ; AVX1-NEXT:    vxorps %ymm1, %ymm0, %ymm0
----------------
spatel wrote:
> That's an interesting case...that we probably can't answer at the DAG level. Would it be better to use two 128-bit vpxor instructions instead of incurring a potential domain-crossing penalty with the one 256-bit vxorps?
Do you mean this? 
```
vextractf128 $1, %ymm0, %xmm1
vpxor %xmm2, %xmm2, %xmm2
vpcmpgtb %xmm1, %xmm2, %xmm1
vpcmpgtb %xmm0, %xmm2, %xmm0
vcmpeqd %xmm2, %xmm2, %xmm2
vpxor %xmm2, %xmm1, %xmm1
vpxor %xmm2, %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
```

https://reviews.llvm.org/D32416