[PATCH] D78494: [AMDGPU][CODEGEN] Added 'A' constraint for inline assembler

Wed May 20 10:22:54 PDT 2020

dp marked 6 inline comments as done.
dp added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10735
+    } else if (BuildVectorSDNode *V = dyn_cast<BuildVectorSDNode>(Op)) {
+      if (Size != 16 || Op.getNumOperands() != 2)
+        return;
----------------
arsenm wrote:
> dp wrote:
> > Should we enforce this limitation? GCC seems to allow arbitrary vector size and type provided that all elements are equal and may be inlined.
> You mean it also accepts <4 x i8> because it's a 32-bit type?
I was wrong about GCC 'vectors'. Looks like they are not regular types and used only for auto-vectorization.

Here is the GCC implementation:

```
(define_constraint "A"
  "Inline immediate parameter"
  (and (match_code "const_int,const_double,const_vector")
       (match_test "gcn_inline_constant_p (op)")))

...

bool
gcn_inline_constant_p (rtx x)
{
  if (GET_CODE (x) == CONST_INT)
    return INTVAL (x) >= -16 && INTVAL (x) <= 64;
  if (GET_CODE (x) == CONST_DOUBLE)
    return gcn_inline_fp_constant_p (x, false);
  if (GET_CODE (x) == CONST_VECTOR)
    {
      int n;
      if (!vgpr_vector_mode_p (GET_MODE (x)))
	return false;
      n = gcn_inline_constant_p (CONST_VECTOR_ELT (x, 0));
      if (!n)
	return false;
      for (int i = 1; i < 64; i++)
	if (CONST_VECTOR_ELT (x, i) != CONST_VECTOR_ELT (x, 0))
	  return false;
      return 1;
    }
  return false;
}

...

inline bool
vgpr_vector_mode_p (machine_mode mode)
{
  return (mode == V64QImode || mode == V64HImode
	  || mode == V64SImode || mode == V64DImode
	  || mode == V64HFmode || mode == V64SFmode || mode == V64DFmode);
}

```

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp:1429-1431
+  if ((Size == 16 && Val == 0x3118) ||
+      (Size == 32 && Val == 0x3e22f983))
+    return 0x3fc45f306dc9c882;
----------------
arsenm wrote:
> Logic doesn't make sense here? Returns the 64-bit value if it's 16 or 32?
When working on this fix I was under an impression that types of values passed to inline assembler are not required to match instruction operand types. In other words, I assumed that the following code is legal:

```
"v_mov_b32 $0, $1", "=v,A"(half 0xH3C00) // 1.0h
```

Also I assumed that this code required a type conversion to end up as "v_mov_b32 ..., 1.0" and not "v_mov_b32 ..., 0x3C00" because in the latter case the constant cannot be encoded as inline value.

That assumption complicated implementation. Was the assumption incorrect?

If types of values passed to inline assembler are required to match types of corresponding assembler operands, the implementation would be much simpler.

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp:1434-1443
+  if (Size < 64) {
+    bool Lost;
+    const fltSemantics *FltSemantics =
+      (Size == 16) ? &APFloat::IEEEhalf() : &APFloat::IEEEsingle();
+    APFloat FPLiteral(*FltSemantics, APInt(Size, Val));
+    FPLiteral.convert(APFloat::IEEEdouble(),
+                      APFloat::rmNearestTiesToEven,
----------------
arsenm wrote:
> I wouldn't expect any FP rounding in the assembler
See my comments above.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78494/new/

https://reviews.llvm.org/D78494