[PATCH] D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts

Fri Dec 14 00:25:50 PST 2018

RKSimon marked an inline comment as not done.
RKSimon added a comment.

@jonpa Are you OK with the systemz knownbits.ll change please?

================
Comment at: test/CodeGen/X86/known-bits-vector.ll:173-177
+; X64-NEXT:    vandps {{.*}}(%rip), %xmm0, %xmm0
+; X64-NEXT:    vorps {{.*}}(%rip), %xmm0, %xmm0
+; X64-NEXT:    vpermilps {{.*#+}} xmm0 = xmm0[2,2,3,3]
+; X64-NEXT:    vcvtdq2ps %xmm0, %xmm0
 ; X64-NEXT:    retq
----------------
andreadb wrote:
> RKSimon wrote:
> > andreadb wrote:
> > > We lose the ability to fold this entire computation into a constant.
> > > 
> > > After the AND+OR sequence, element 2 and 3 of %xmm0 are known at compile time (i.e. those are both value `65535`).
> > > The vector permute can therefore be folded away. And we can perform the int2fp conversion at compile time. Effectively folding away the entire computation into a load from constant pool.
> > > 
> > > To be fair, we could even shrink the constant pool entry by using a vbroadcastss instead of a vmovaps (on AVX).
> > The problem here is that SimplifyDemandedElts runs first and updates the and/or to:
> > ```
> > %1 = and <4 x i32> %a0, <i32 undef, i32 undef, i32 255, i32 4085>
> > %2 = or <4 x i32> %1, <i32 65535, i32 65535, i32 65535, i32 65535> // broadcasts are preserved
> > ```
> > Then SimplifyDemandedBits runs, doesn't know about which vector elements are needed, and so can't fold to:
> > ```
> > uitofp <4 x i32><i32 65535, i32 65535, i32 65535, i32 65535> to <4 x float>
> > ```
> > for constant folding as it used to do.
> > 
> > I may be able to attempt to constant fold more aggressively in SimplifyDemandedElts, but failing that the best option going forward for this kind of regression would be to merge SimplifyDemandedBits and SimplifyDemandedElts into a single pass, matching what ComputeNumBits does with scalar/vectors.
> > 
> > Also, the original purpose of this test wasn't to constant fold but to recognise that the uitofp could be simplified to sitofp (so x86 could use cvtdq2ps).
> If fixing this regression is not simple, then can raise a bug for it and work on it later. What do you think?
I've raised https://bugs.llvm.org/show_bug.cgi?id=40000 which should cover this kind of issue.

================
Comment at: test/CodeGen/X86/packss.ll:271-275
   %1 = shl <4 x i64> %a0, <i64 63, i64 0, i64 63, i64 0>
   %2 = ashr <4 x i64> %1, <i64 63, i64 0, i64 63, i64 0>
   %3 = bitcast <4 x i64> %2 to <8 x i32>
   %4 = shufflevector <8 x i32> %3, <8 x i32> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 4, i32 4, i32 4, i32 4>
   %5 = trunc <8 x i32> %4 to <8 x i16>
----------------
andreadb wrote:
> RKSimon wrote:
> > andreadb wrote:
> > > Unrelated to this patch.
> > > 
> > > This may be hard to catch...
> > > 
> > > On AVX2 and AVX, we could probably simplify it to this (didn't verify that the shuffle mask is correct):
> > > 
> > > ```
> > > vpslld    $31, %ymm0, %ymm0
> > > vpsrad  $31, %ymm0, %ymm0
> > > vpshufd $1, %ymm0, %ymm0
> > > vextractf128 $1, %ymm0, %xmm1
> > > vpackssdw %xmm1, %xmm0, %xmm0
> > > ```
> > > 
> > > That would require quite a lot of knowledge about both demanded bits and demanded elts. Also, it requires that we sink the bitcast in the shift operands, and then we shuffle elements after.
> > > 
> > > This may be something worthy to investigate in future..
> > SimplifyDemandedVectorElts doesn't handle shifts yet - its on the list, but as you can see just OR/XOR support causes a lot a diffs!
> Yeah I noticed :-).
> 
> Anwyay, I trust your judgment on this.
> My understanding is that your plan is to keep working on improving this area. If that's the case, then I am happy if you file a bug to track the progress on fixing that particular regression, so that this change can be committed.
Again https://bugs.llvm.org/show_bug.cgi?id=40000 might help here (only demand the signbit from some elements)

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55600/new/

https://reviews.llvm.org/D55600