[PATCH] D77804: [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits (WIP)

Mon May 16 06:23:47 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/trunc-combine.ll:148
 ; SI-NEXT:    v_or_b32_e32 v0, v0, v1
-; SI-NEXT:    v_lshrrev_b32_e32 v1, 16, v0
+; SI-NEXT:    v_and_b32_e32 v1, s4, v2
 ; SI-NEXT:    s_setpc_b64 s[30:31]
----------------
arsenm wrote:
> RKSimon wrote:
> > arsenm wrote:
> > > RKSimon wrote:
> > > > @arsenm @foad Not sure if pulling out the immediate is a good idea or not - shouldn't a u16 immediate be cheap?
> > > This is worse. Integer constants -16 to 64 and a handful of FP values are free, but 0xffff is not so it requires materialization.
> > @arsenm @foad At EuroLLVM Matt suggested that maybe we should increase the tolerance to 2 uses of the large immediates before pulling out the constant?
> s_mov_b32 K + 2 * v_and_b32_32 = 16 bytes, 12 cycles
> 2 * (v_and_b32_e32 K) = 16 bytes, 8 cycles which is clearly better.
> 
> 3 * (v_and_b32_e32 K) = 24 bytes, 12 cycles
> 
> So 2 uses of a constant seems plainly better for VOP1/VOP2 ops. Abbe that it becomes a code size vs. latency tradeoff
This decision is also generally made by SIFoldOperands. Probably need to fix it there and not in the DAG

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77804/new/

https://reviews.llvm.org/D77804