[llvm] [RISCV][TTI] Add cost of typebased cast VPIntrinsics with functionalOPC. (PR #97797)

Thu Jul 25 19:14:29 PDT 2024

================
@@ -19,6 +19,140 @@ define void @unsupported_fp_ops(<vscale x 4 x float> %vec, i32 %extraarg) {
   ret void
 }
 
+define void @int_ptrtoint() {
+; CHECK-LABEL: 'int_ptrtoint'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %1 = ptrtoint ptr undef to i64
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %2 = ptrtoint <1 x ptr> undef to <1 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %3 = ptrtoint <2 x ptr> undef to <2 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %4 = ptrtoint <4 x ptr> undef to <4 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = ptrtoint <8 x ptr> undef to <8 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %6 = ptrtoint <16 x ptr> undef to <16 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %7 = ptrtoint <vscale x 1 x ptr> undef to <vscale x 1 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %8 = ptrtoint <vscale x 2 x ptr> undef to <vscale x 2 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %9 = ptrtoint <vscale x 4 x ptr> undef to <vscale x 4 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = ptrtoint <vscale x 8 x ptr> undef to <vscale x 8 x i64>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = call <1 x i64> @llvm.vp.ptrtoint.v1i64.v1p0(<1 x ptr> undef, <1 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = call <2 x i64> @llvm.vp.ptrtoint.v2i64.v2p0(<2 x ptr> undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %13 = call <4 x i64> @llvm.vp.ptrtoint.v4i64.v4p0(<4 x ptr> undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %14 = call <8 x i64> @llvm.vp.ptrtoint.v8i64.v8p0(<8 x ptr> undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %15 = call <16 x i64> @llvm.vp.ptrtoint.v16i64.v16p0(<16 x ptr> undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %16 = call <vscale x 1 x i64> @llvm.vp.ptrtoint.nxv1i64.nxv1p0(<vscale x 1 x ptr> undef, <vscale x 1 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %17 = call <vscale x 2 x i64> @llvm.vp.ptrtoint.nxv2i64.nxv2p0(<vscale x 2 x ptr> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %18 = call <vscale x 4 x i64> @llvm.vp.ptrtoint.nxv4i64.nxv4p0(<vscale x 4 x ptr> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %19 = call <vscale x 8 x i64> @llvm.vp.ptrtoint.nxv8i64.nxv8p0(<vscale x 8 x ptr> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; TYPEBASED-LABEL: 'int_ptrtoint'
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %1 = ptrtoint ptr undef to i64
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %2 = ptrtoint <1 x ptr> undef to <1 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %3 = ptrtoint <2 x ptr> undef to <2 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %4 = ptrtoint <4 x ptr> undef to <4 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = ptrtoint <8 x ptr> undef to <8 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %6 = ptrtoint <16 x ptr> undef to <16 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %7 = ptrtoint <vscale x 1 x ptr> undef to <vscale x 1 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %8 = ptrtoint <vscale x 2 x ptr> undef to <vscale x 2 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %9 = ptrtoint <vscale x 4 x ptr> undef to <vscale x 4 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = ptrtoint <vscale x 8 x ptr> undef to <vscale x 8 x i64>
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = call <1 x i64> @llvm.vp.ptrtoint.v1i64.v1p0(<1 x ptr> undef, <1 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = call <2 x i64> @llvm.vp.ptrtoint.v2i64.v2p0(<2 x ptr> undef, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %13 = call <4 x i64> @llvm.vp.ptrtoint.v4i64.v4p0(<4 x ptr> undef, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %14 = call <8 x i64> @llvm.vp.ptrtoint.v8i64.v8p0(<8 x ptr> undef, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %15 = call <16 x i64> @llvm.vp.ptrtoint.v16i64.v16p0(<16 x ptr> undef, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %16 = call <vscale x 1 x i64> @llvm.vp.ptrtoint.nxv1i64.nxv1p0(<vscale x 1 x ptr> undef, <vscale x 1 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %17 = call <vscale x 2 x i64> @llvm.vp.ptrtoint.nxv2i64.nxv2p0(<vscale x 2 x ptr> undef, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %18 = call <vscale x 4 x i64> @llvm.vp.ptrtoint.nxv4i64.nxv4p0(<vscale x 4 x ptr> undef, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %19 = call <vscale x 8 x i64> @llvm.vp.ptrtoint.nxv8i64.nxv8p0(<vscale x 8 x ptr> undef, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+  ptrtoint ptr undef to i64
+  ptrtoint <1 x ptr> undef to <1 x i64>
+  ptrtoint <2 x ptr> undef to <2 x i64>
+  ptrtoint <4 x ptr> undef to <4 x i64>
+  ptrtoint <8 x ptr> undef to <8 x i64>
+  ptrtoint <16 x ptr> undef to <16 x i64>
+  ptrtoint <vscale x 1 x ptr> undef to <vscale x 1 x i64>
+  ptrtoint <vscale x 2 x ptr> undef to <vscale x 2 x i64>
+  ptrtoint <vscale x 4 x ptr> undef to <vscale x 4 x i64>
+  ptrtoint <vscale x 8 x ptr> undef to <vscale x 8 x i64>
+  call <1 x i64> @llvm.vp.ptrtoint.v1i64.v1ptr(<1 x ptr> undef, <1 x i1> undef, i32 undef)
+  call <2 x i64> @llvm.vp.ptrtoint.v2i64.v2ptr(<2 x ptr> undef, <2 x i1> undef, i32 undef)
+  call <4 x i64> @llvm.vp.ptrtoint.v4i64.v4ptr(<4 x ptr> undef, <4 x i1> undef, i32 undef)
+  call <8 x i64> @llvm.vp.ptrtoint.v8i64.v8ptr(<8 x ptr> undef, <8 x i1> undef, i32 undef)
+  call <16 x i64> @llvm.vp.ptrtoint.v16i64.v16ptr(<16 x ptr> undef, <16 x i1> undef, i32 undef)
+  call <vscale x 1 x i64> @llvm.vp.ptrtoint.nxv1i64.nxv1ptr(<vscale x 1 x ptr> undef, <vscale x 1 x i1> undef, i32 undef)
+  call <vscale x 2 x i64> @llvm.vp.ptrtoint.nxv2i64.nxv2ptr(<vscale x 2 x ptr> undef, <vscale x 2 x i1> undef, i32 undef)
+  call <vscale x 4 x i64> @llvm.vp.ptrtoint.nxv4i64.nxv4ptr(<vscale x 4 x ptr> undef, <vscale x 4 x i1> undef, i32 undef)
+  call <vscale x 8 x i64> @llvm.vp.ptrtoint.nxv8i64.nxv8ptr(<vscale x 8 x ptr> undef, <vscale x 8 x i1> undef, i32 undef)
+  ret void
+}
+
+define void @int_inttoptr() {
+; CHECK-LABEL: 'int_inttoptr'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %1 = inttoptr i32 undef to ptr
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %2 = inttoptr <1 x i32> undef to <1 x ptr>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %3 = inttoptr <2 x i32> undef to <2 x ptr>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %4 = inttoptr <4 x i32> undef to <4 x ptr>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = inttoptr <8 x i32> undef to <8 x ptr>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %6 = inttoptr <16 x i32> undef to <16 x ptr>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %7 = inttoptr <vscale x 1 x i32> undef to <vscale x 1 x ptr>
----------------
ElvisWang123 wrote:

The `getCastInstCost()` will call `BasicTTIImplBase::getCastInstCost()` which will call `TargetTransformInfoImplBase::getCastInstCost()`
In `TargetTransformInfoImplBase::getCastInstCost()` , it will using following code to check if the conversion is free.
```
   case Instruction::IntToPtr: {
      unsigned SrcSize = Src->getScalarSizeInBits();
      if (DL.isLegalInteger(SrcSize) &&
          SrcSize <= DL.getPointerTypeSizeInBits(Dst))
        return 0;
      break;
    }
    case Instruction::PtrToInt: {
      unsigned DstSize = Dst->getScalarSizeInBits();
      if (DL.isLegalInteger(DstSize) &&
          DstSize >= DL.getPointerTypeSizeInBits(Src))
        return 0;
      break;
    }
```
And  above statements were copied from [ee959ddc5eee1](https://github.com/llvm/llvm-project/commit/ee959ddc5eee180f75851649d01260a4f0ba5198) with following comments.
```
-    case Instruction::IntToPtr: {
-      // An inttoptr cast is free so long as the input is a legal integer type
-      // which doesn't contain values outside the range of a pointer.
-      unsigned OpSize = OpTy->getScalarSizeInBits();
-      if (DL.isLegalInteger(OpSize) &&
-          OpSize <= DL.getPointerTypeSizeInBits(Ty))
-        return TTI::TCC_Free;
-
-      // Otherwise it's not a no-op.
-      return TTI::TCC_Basic;
-    }
-    case Instruction::PtrToInt: {
-      // A ptrtoint cast is free so long as the result is large enough to store
-      // the pointer, and a legal integer type.
-      unsigned DestSize = Ty->getScalarSizeInBits();
-      if (DL.isLegalInteger(DestSize) &&
-          DestSize >= DL.getPointerTypeSizeInBits(OpTy))
-        return TTI::TCC_Free;
-
-      // Otherwise it's not a no-op.
-      return TTI::TCC_Basic;
  }
```
According to the comment, the cost of `inttoptr` is zero in this test case.

https://github.com/llvm/llvm-project/pull/97797