[PATCH] D33866: [DAGCombiner] loosen restriction for creating narrow vector load from extract(wide load)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jun 4 07:34:39 PDT 2017
spatel added inline comments.
================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:3669
; AVX512F-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]
; AVX512F-NEXT: vextracti32x4 $1, %zmm0, %xmm0
; AVX512F-NEXT: vmovq %xmm0, %rax
----------------
niravd wrote:
> We're only partially converting the load-extracts here. there should only be a load to zmmX and extracts or 4 direct loads to xmmX.
Agreed - that's what I meant in the description when I said that these diffs might be seen as bugs in isExtractSubvectorCheap().
In this case, x86 has made it cheap to extract from index 0 or one other index:
return (Index == 0 || Index == ResVT.getVectorNumElements());
Clearly, this was only tested with cases where we are extracting a half-sized vector. So it misses 2 out of the N/4 possibilities for AVX512 in this test.
I think this change is still an improvement (but not ideal of course), but my goal with this patch was really to answer the questions for the non-x86 diffs. I could just skip this step and post the more liberal patch with more test diffs if that seems better.
https://reviews.llvm.org/D33866
More information about the llvm-commits
mailing list