[PATCH] D77804: [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits (WIP)
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 16 06:17:50 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/trunc-combine.ll:148
; SI-NEXT: v_or_b32_e32 v0, v0, v1
-; SI-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; SI-NEXT: v_and_b32_e32 v1, s4, v2
; SI-NEXT: s_setpc_b64 s[30:31]
----------------
RKSimon wrote:
> arsenm wrote:
> > RKSimon wrote:
> > > @arsenm @foad Not sure if pulling out the immediate is a good idea or not - shouldn't a u16 immediate be cheap?
> > This is worse. Integer constants -16 to 64 and a handful of FP values are free, but 0xffff is not so it requires materialization.
> @arsenm @foad At EuroLLVM Matt suggested that maybe we should increase the tolerance to 2 uses of the large immediates before pulling out the constant?
s_mov_b32 K + 2 * v_and_b32_32 = 16 bytes, 12 cycles
2 * (v_and_b32_e32 K) = 16 bytes, 8 cycles which is clearly better.
3 * (v_and_b32_e32 K) = 24 bytes, 12 cycles
So 2 uses of a constant seems plainly better for VOP1/VOP2 ops. Abbe that it becomes a code size vs. latency tradeoff
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77804/new/
https://reviews.llvm.org/D77804
More information about the llvm-commits
mailing list