[PATCH] D135869: [AMDGPU][DAG] Only apply trunc/shift combine to 16 bit types

Fri Oct 14 00:49:12 PDT 2022

Pierre-vh added a comment.

This is trying to solve a miscompilation, but it might be the wrong fix. I know for sure this combine is responsible for a miscompilation in a sample, as disabling it/removing it/applying this fix resolves the issue.

This is the DAG being wrongly combined:

        t26: i64 = srl t23, Constant:i32<26>
        t28: i64 = mul nuw nsw t22, Constant:i64<12345678>
      t29: i64 = add nuw nsw t26, t28
    t32: i64 = srl t29, Constant:i32<25>
  t33: i26 = truncate t32

  Combining: t33: i26 = truncate t32
  Creating new node: t82: i32 = truncate t29
  Creating new node: t83: i32 = srl t82, Constant:i32<25>
  Creating new node: t84: i26 = truncate t83
   ... into: t84: i26 = truncate t83

srl is a shift-right, so it doesn't seem right to me that we truncate the source to 32 bits as it changes the output.

Maybe the error is in the VT of the first trunc? Shouldn't we have at least 2*K for shift-right?
Maybe this is a better fix? This also works to fix the miscompilation I'm seeing.

  EVT MidScalarTy = MVT::i32;

  // For right shifts, ensure the VT of the shift source is wide
  // enough that we don't lose bits in the result.
  if(Src.getOpcode() == ISD::SRL || Src.getOpcode() == ISD::SRA) {

    // Don't risk losing info if we don't know the shift amount.
    if(!Known.isConstant())
      return SDValue();

    const uint64_t ScalarWidth = Known.getConstant().getZExtValue() * 2;
    if(ScalarWidth >= 64)
      return SDValue();

    MidScalarTy = EVT::getIntegerVT(*DAG.getContext(), ScalarWidth);
  }

  EVT MidVT = VT.isVector() ?
    EVT::getVectorVT(*DAG.getContext(), MidScalarTy,
                     VT.getVectorNumElements()) : MidScalarTy;

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135869/new/

https://reviews.llvm.org/D135869