[PATCH] D62100: [DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handling
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 17 14:45:40 PST 2022
arsenm added inline comments.
================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll:686-687
; SI-NEXT: s_waitcnt vmcnt(0)
-; SI-NEXT: v_lshlrev_b32_e32 v0, 31, v0
-; SI-NEXT: v_lshrrev_b32_e32 v0, 1, v0
+; SI-NEXT: v_lshlrev_b32_e32 v0, 30, v0
+; SI-NEXT: v_and_b32_e32 v0, 2.0, v0
; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0
----------------
lebedev.ri wrote:
> arsenm wrote:
> > arsenm wrote:
> > > lebedev.ri wrote:
> > > > @arsenm will AMDGPU prefer 2 shifts or shift+mask here?
> > > In this particular case, they're the same. In general 2 shifts is probably better. The mask value is less likely to be an inline immediate. It seems like we have a BFE matching problem though
> > For 64-bit, shift and mask would be better
> ```
> bool AMDGPUTargetLowering::shouldFoldConstantShiftPairToMask(
> const SDNode *N, CombineLevel Level) const {
> EVT VT = N->getValueType(0);
> return VT.isScalarInteger() && VT.getScalarSizeInBits() == 64;
> }
> ```
> results in many AMDGPU test regressions, i can submit a patch, but i thought i'd double-check first - is that what you meant?
That's not what I meant, it's a quirk of these particular shift values. However I assume this change would be good, since masking will be better than 64-bit shifts
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D62100/new/
https://reviews.llvm.org/D62100
More information about the llvm-commits
mailing list