[llvm] [AMDGPU] Rework dot4 signedness checks (PR #68757)

Jeffrey Byrnes via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 26 13:00:48 PDT 2023


================
@@ -12952,30 +12952,51 @@ static bool isMul(const SDValue Op) {
 
 static std::optional<bool> checkSignedness(const SDValue &N,
                                            ByteProvider<SDValue> &Src0,
-                                           ByteProvider<SDValue> &Src1) {
+                                           ByteProvider<SDValue> &Src1,
+                                           const SDValue &S1Op,
+                                           const SDValue &S0Op) {
   auto MulOpcode = N.getOpcode();
-  std::optional<bool> IterIsSigned;
-  // Both sides of the tree must have the same signedness semantics.
-  if ((Src0.IsSigned != Src1.IsSigned) ||
-      (Src0.IsSigned.value_or(false) != Src1.IsSigned.value_or(false)))
-    return IterIsSigned;
-  // If we have a MUL_U24 op with signed semantics, then fail.
-  if (Src0.IsSigned.value_or(false) && MulOpcode == AMDGPUISD::MUL_U24)
-    return IterIsSigned;
-  // If we have a MUL_I24 op with unsigned semantics, then fail.
-  if (!Src0.IsSigned.value_or(true) && MulOpcode == AMDGPUISD::MUL_I24)
-    return IterIsSigned;
-
-  bool TopLevelSignedness =
-      MulOpcode == AMDGPUISD::MUL_I24 ||
-      (MulOpcode == ISD::MUL && N.getNode()->getFlags().hasNoSignedWrap() &&
-       !N.getNode()->getFlags().hasNoUnsignedWrap());
-
-  // In cases where we are accumulating into an i8 (for v_dot4), the
-  // ByteProvider will not have signedness info since the MSBs are dont-cares.
-  // In this case, we simply use the TopLevelSignedness of the instruction.
-  IterIsSigned = Src0.IsSigned.value_or(TopLevelSignedness);
-  return IterIsSigned;
+
+  // We have previously determined the signedness semantics
----------------
jrbyrnes wrote:

> I still don't really understand what sightedness semantics means here.

What I mean by "signedness semantics" is depdenent upon context, but the phrase is coming from the implementation of the instruction -- part of semantics of dot4 is to s/z ext the 8 bit operands when doing the add / mul chain. When lowering into dot4, we must determine which version (dot4_i32 / dot4_u32), thus we must determine if our IR matches the "signedness semantics" of either. 

When I mention the IR having "signedness semantics" -- or we have calcalated it -- I generally mean that we know for one reason that our IR matches the signedness semantics of the instruction. The "signedness semantics" of the instruction is semantically equivalent to IR which has uniformly s/z 8 bit operands before doing scalar multiply / adds. Additionally, if we can determine that both sign bits of the operands are the same, then we can use this info to choose between the signed or unsigned version of the instruction -- we are able to do this because we know there is only a ByteProvider for Byte 0 (all other Bytes are just extension bits). 


> Can't you just computeKnownBits and check whether the sign bits are known?

The sign bit determination has already been done in AMDGPUCodeGenPrepare when lowering into MUL24. so, if we have these ops we can just use that embedded info and our job is done. Unfortunately, there are many cases wherein we have no idea what the underlying bits are, so ValueTracking does not help us. In these cases, we would like for the dot combine to not fail, and, instead, we would like to inspect the tree to see if the IR semantics (e.g. s/z extension) match the semantics of one of the versions of the instructions. 

This logic is encoded into this patch in the following way:

case 0 (not labeled): we know the sign bits of both operands of the mul, and they are the same (i.e. we are using mul24). 
	use mul24 version to match "signedness semantics" of instruction

case 1: both byte providers have signedness info
	This is determined by tracking s/z extensions of the IR through the tree.

case 2: only one BP has signedness info
	In this case, we can do analysis to determine the sign bit of the BP with no signedness info. 
	This may occur of we use 
```
		op1 = anyext i8 op to i16
		op2 = and i16 op1, 255
```
	  instead of
`	  	op2 = zext i8 op to i16`

case 3: we don't have any of the above conditions (neither BP has signedness info, and we can't determine sign bit for both)
	There are only two ways which result in neither BP having signedness info: we are exclusively using anyext, or we are doing arithmetic to do the "extension" (see case 2). For the latter, we would be able to determine sign bit, so the only way to enter this case if by exclusive use of anyext. Thus, we know all but the first 8 bits are dont-cares, so we can use either version of the dot. This occurs in the dot8 tests (idot4s.ll @idot4_acc8)




https://github.com/llvm/llvm-project/pull/68757


More information about the llvm-commits mailing list