[PATCH] D33866: [DAGCombiner] loosen restriction for creating narrow vector load from extract(wide load)

Sun Jun 4 07:34:39 PDT 2017

spatel added inline comments.

================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:3669
 ; AVX512F-NEXT:    vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]
 ; AVX512F-NEXT:    vextracti32x4 $1, %zmm0, %xmm0
 ; AVX512F-NEXT:    vmovq %xmm0, %rax
----------------
niravd wrote:
> We're only partially converting the load-extracts here. there should only be a load to zmmX and extracts or 4 direct loads to xmmX. 
Agreed - that's what I meant in the description when I said that these diffs might be seen as bugs in isExtractSubvectorCheap().

In this case, x86 has made it cheap to extract from index 0 or one other index:

  return (Index == 0 || Index == ResVT.getVectorNumElements());

Clearly, this was only tested with cases where we are extracting a half-sized vector. So it misses 2 out of the N/4 possibilities for AVX512 in this test.

I think this change is still an improvement (but not ideal of course), but my goal with this patch was really to answer the questions for the non-x86 diffs. I could just skip this step and post the more liberal patch with more test diffs if that seems better.

https://reviews.llvm.org/D33866