[PATCH] D88783: [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw

Thu Oct 8 07:00:02 PDT 2020

RKSimon added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2090-2091
+    if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L))))) {
+      KnownBits KnownL = IC.computeKnownBits(L, /*Depth*/ 0, &Or);
+      return KnownL.getMaxValue().ult(Width) ? L : nullptr;
+    }
----------------
RKSimon wrote:
> lebedev.ri wrote:
> > spatel wrote:
> > > lebedev.ri wrote:
> > > > Either this needs a motivational comment, or this can be dropped.
> > > > ```
> > > > ----------------------------------------
> > > > define i64 @src(i64 %x, i64 %y, i64 %a) {
> > > > %0:
> > > >   %mask = and i64 %a, -1
> > > >   %shl = shl i64 %x, %mask
> > > >   %sub = sub nsw nuw i64 64, %mask
> > > >   %shr = lshr i64 %y, %sub
> > > >   %r = or i64 %shl, %shr
> > > >   ret i64 %r
> > > > }
> > > > =>
> > > > define i64 @tgt(i64 %x, i64 %y, i64 %a) {
> > > > %0:
> > > >   %r = fshl i64 %x, i64 %y, i64 %a
> > > >   ret i64 %r
> > > > }
> > > > Transformation seems to be correct!
> > > > 
> > > > ```
> > > We can correctly convert to the intrinsic because the intrinsic makes guarantees about the shift amount (modulo bitwidth) that may not be present in the original IR.
> > > 
> > > But without proper masking/select for UB shift amounts, the expanded code will be worse in codegen. So this is an irreversible transform, and we can't do it universally.
> > > 
> > > So even with the computeKnownBits restriction in this patch, this transform may not be allowed. For example, can we fix this in codegen:
> > > https://alive2.llvm.org/ce/z/IsuRoN
> > > 
> > > ```
> > > $ cat fsh.ll 
> > > define i32 @fake_fshl(i32 %x, i32 %y, i32 %a) {
> > >   %mask = and i32 %a, 31
> > >   %shl = shl i32 %x, %mask
> > >   %sub = sub nuw nsw i32 32, %mask
> > >   %shr = lshr i32 %y, %sub
> > >   %r = or i32 %shl, %shr
> > >   ret i32 %r
> > > }
> > > 
> > > define i32 @fshl(i32 %x, i32 %y, i32 %a) {
> > >   %r = call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %a)
> > >   ret i32 %r
> > > }
> > > 
> > > declare i32 @llvm.fshl.i32(i32, i32, i32)
> > > 
> > > $ llc -o - fsh.ll -mtriple=riscv64
> > > fake_fshl:                              # @fake_fshl
> > > 	sllw	a0, a0, a2
> > > 	addi	a3, zero, 32
> > > 	sub	a2, a3, a2
> > > 	srlw	a1, a1, a2
> > > 	or	a0, a0, a1
> > > 	ret
> > > fshl:                                   # @fshl
> > > 	andi	a2, a2, 31
> > > 	sll	a0, a0, a2
> > > 	not	a2, a2
> > > 	slli	a1, a1, 32
> > > 	srli	a1, a1, 1
> > > 	srl	a1, a1, a2
> > > 	or	a0, a0, a1
> > > 	ret
> > > 
> > > ```
> > > We can correctly convert to the intrinsic because the intrinsic makes guarantees about the shift amount (modulo bitwidth) that may not be present in the original IR.
> > 
> > Yep, the intrinsic has an implicit masking, as we can see by said masking being dropped after the transform.
> > 
> > > But without proper masking/select for UB shift amounts, the expanded code will be worse in codegen. So this is an irreversible transform, and we can't do it universally.
> > 
> > Yep, that is why i'm asking for more blurb.
> > The problematic case in mind is, what we were able to proove that the value is limited because we had range info on a load?
> At least part of the riscv64 issue is due to it deciding to promote the fshl to i64 but not the or+shifts pattern (iirc the sllw/srlw variants are i32 shifts with mod32 that sext the results to i64).
> 
> ```
> SelectionDAG has 15 nodes:
>   t0: ch = EntryToken
>       t2: i64,ch = CopyFromReg t0, Register:i64 %0
>         t4: i64,ch = CopyFromReg t0, Register:i64 %1
>       t16: i64 = shl t4, Constant:i64<32>
>         t6: i64,ch = CopyFromReg t0, Register:i64 %2
>       t21: i64 = and t6, Constant:i64<31>
>     t18: i64 = fshl t2, t16, t21
>   t13: ch,glue = CopyToReg t0, Register:i64 $x10, t18
>   t14: ch = RISCVISD::RET_FLAG t13, Register:i64 $x10, t13:1
> ```
We might be able to improve DAGTypeLegalizer::PromoteIntRes_FunnelShift in the case where the promoted type is >= 2* original type to just a shift(or(shl(x,bw),y),urem(z,bw)) pattern unless the promoted fshl/fshr is legal.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88783/new/

https://reviews.llvm.org/D88783