[PATCH] D77804: [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits (WIP)
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 16 06:36:02 PDT 2022
foad added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/trunc-combine.ll:148
; SI-NEXT: v_or_b32_e32 v0, v0, v1
-; SI-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; SI-NEXT: v_and_b32_e32 v1, s4, v2
; SI-NEXT: s_setpc_b64 s[30:31]
----------------
arsenm wrote:
> arsenm wrote:
> > RKSimon wrote:
> > > arsenm wrote:
> > > > RKSimon wrote:
> > > > > @arsenm @foad Not sure if pulling out the immediate is a good idea or not - shouldn't a u16 immediate be cheap?
> > > > This is worse. Integer constants -16 to 64 and a handful of FP values are free, but 0xffff is not so it requires materialization.
> > > @arsenm @foad At EuroLLVM Matt suggested that maybe we should increase the tolerance to 2 uses of the large immediates before pulling out the constant?
> > s_mov_b32 K + 2 * v_and_b32_32 = 16 bytes, 12 cycles
> > 2 * (v_and_b32_e32 K) = 16 bytes, 8 cycles which is clearly better.
> >
> > 3 * (v_and_b32_e32 K) = 24 bytes, 12 cycles
> >
> > So 2 uses of a constant seems plainly better for VOP1/VOP2 ops. Abbe that it becomes a code size vs. latency tradeoff
> This decision is also generally made by SIFoldOperands. Probably need to fix it there and not in the DAG
I'm strongly in favour of never pulling out the constant (or rather, always folding into the instruction) and I have patches to that effect starting with D114643, which I'm hoping to get back to pretty soon.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77804/new/
https://reviews.llvm.org/D77804
More information about the llvm-commits
mailing list