[PATCH] D42031: [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 23 06:26:23 PST 2018


craig.topper added inline comments.


================
Comment at: test/CodeGen/X86/avx512-insert-extract.ll:1068
+; KNL-NEXT:    kmovw %k0, %eax
 ; KNL-NEXT:    andb $1, %al
 ; KNL-NEXT:    movb $4, %cl
----------------
craig.topper wrote:
> RKSimon wrote:
> > craig.topper wrote:
> > > RKSimon wrote:
> > > > craig.topper wrote:
> > > > > RKSimon wrote:
> > > > > > craig.topper wrote:
> > > > > > > RKSimon wrote:
> > > > > > > > Is the andb necessary? Are we missing some known bits handling here?
> > > > > > > Yeah its unnecessary based on how it gets isel. But I'm not sure the way the DAG is written that we can guarantee it.
> > > > > > > 
> > > > > > > ```
> > > > > > >   t0: ch = EntryToken
> > > > > > >                   t2: v64i8,ch = CopyFromReg t0, Register:v64i8 %0
> > > > > > >                   t4: v64i8,ch = CopyFromReg t0, Register:v64i8 %1
> > > > > > >                 t24: v64i1 = X86ISD::CMPMU t2, t4, Constant:i8<6>
> > > > > > >               t28: v64i1 = X86ISD::KSHIFTR t24, Constant:i8<63>
> > > > > > >             t30: i32 = extract_vector_elt t28, Constant:i64<0>
> > > > > > >           t26: i8 = truncate t30
> > > > > > >         t22: i8 = and t26, Constant:i8<1>
> > > > > > >       t19: i8 = sub Constant:i8<4>, t22
> > > > > > >     t13: i32 = zero_extend t19
> > > > > > >   t16: ch,glue = CopyToReg t0, Register:i32 %eax, t13
> > > > > > >   t17: ch = X86ISD::RET_FLAG t16, TargetConstant:i32<0>, Register:i32 %eax, t16:1
> > > > > > > ```
> > > > > > > 
> > > > > > > I don't think we can make any assumptions about the upper bits of the input to the extract_vector_elt being passed through.
> > > > > > > 
> > > > > > > We could maybe pattern match this out during isel?
> > > > > > > 
> > > > > > > Or we could legalize to a DAG that explicitly contains a KMOVW node instead of extract_vector_elt. Then we could probably put out an AssertZExt.
> > > > > > Couldn't we handle this by adding X86ISD::KSHIFTR handling to X86TargetLowering::computeKnownBitsForTargetNode (with suitable DemandedElts twiddling)?
> > > > > But computeKnownBits for extract_vector_elt won't pass the info through. It doesn't know the bits from the upper elements are going to make it through to the result. (they don't for any other vector type). It would just take the known bits info about element 0 and any extend that to i32.
> > > > OK - sorry I was thinking that the extract_vector_elt disappeared when it became a KSHIFT+KMOVW.
> > > Oh sorry. If we use KMOVW explicitly in the DAG then yes the extract_vector_elt goes away. But it still looks sort of like a vector to scalar bitcast.
> > > 
> > > The KMOVW will demand all elts from the KSHIFT.  And the KSHIFT won't be able to say that some elts are 0 and the lowest element is unknown will it?
> > OK - it sounds like we need some better computeknownbits / demandedbits support for target nodes - that was what D38832 was trying to do yes?
> I don't know if that helps here. For vectors, compute known bits tells the common known bits of all elements. So for the KSHIFT it would only return a KnownBits object with width 1. That won't be able to convey anything about the upper elements separately from the lower element.
Maybe it makes more sense to do the KMOV and then a GPR shift if assuming we have a wide enough KMOV available? KSHIFT has a 3 cycle latency on SKX while a GPR shift is only 1 cyc.

Known bits would handle that well.


Repository:
  rL LLVM

https://reviews.llvm.org/D42031





More information about the llvm-commits mailing list