[PATCH] D62100: [DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handling

Mon Jan 17 14:45:40 PST 2022

arsenm added inline comments.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll:686-687
 ; SI-NEXT:    s_waitcnt vmcnt(0)
-; SI-NEXT:    v_lshlrev_b32_e32 v0, 31, v0
-; SI-NEXT:    v_lshrrev_b32_e32 v0, 1, v0
+; SI-NEXT:    v_lshlrev_b32_e32 v0, 30, v0
+; SI-NEXT:    v_and_b32_e32 v0, 2.0, v0
 ; SI-NEXT:    buffer_store_dword v0, off, s[4:7], 0
----------------
lebedev.ri wrote:
> arsenm wrote:
> > arsenm wrote:
> > > lebedev.ri wrote:
> > > > @arsenm will AMDGPU prefer 2 shifts or shift+mask here?
> > > In this particular case, they're the same. In general 2 shifts is probably better. The mask value is less likely to be an inline immediate. It seems like we have a BFE matching problem though
> > For 64-bit, shift and mask would be better
> ```
> bool AMDGPUTargetLowering::shouldFoldConstantShiftPairToMask(
>     const SDNode *N, CombineLevel Level) const {
>   EVT VT = N->getValueType(0);
>   return VT.isScalarInteger() && VT.getScalarSizeInBits() == 64;
> }
> ```
> results in many AMDGPU test regressions, i can submit a patch, but i thought i'd double-check first - is that what you meant?
That's not what I meant, it's a quirk of these particular shift values. However I assume this change would be good, since masking will be better than 64-bit shifts

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62100/new/

https://reviews.llvm.org/D62100