[PATCH] D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers

Thu Feb 14 06:03:06 PST 2019

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/known-signbits-vector.ll:79
 ; X64-NEXT:    shrq $32, %rax
 ; X64-NEXT:    vcvtsi2ssl %eax, %xmm1, %xmm0
 ; X64-NEXT:    retq
----------------
RKSimon wrote:
> spatel wrote:
> > RKSimon wrote:
> > > Any idea why this still fails?
> > This is almost the same example as in:
> > https://bugs.llvm.org/show_bug.cgi?id=39975
> > 
> > On x86-64 only (because the 64-bit shift isn't legal on i686), we scalarize the shift. So that means we have the shift sitting between the extract and cast, so no match.
> Hmm - is it worth us investigating a trunc(lshr(extract(v2i64 x, i), 32)) -> trunc(extract(v2i64 x, i+1)) combine? (and variants)
Yeah, I though we already had that transform, but that case isn't matched. 

Note that this example is an ashr, not lshr, and the example in PR39975 is probably tougher because it's shift-by-33. We might want to carve out an exception for the scalarization transform for these kinds of cases in general.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58197/new/

https://reviews.llvm.org/D58197