[Mlir-commits] [mlir] [mlir] Target Description and Cost Model in MLIR (PR #85141)

Wed Mar 20 14:13:24 PDT 2024

nhasabni wrote:

Thanks @ZhennanQin for the questions. Please see my answers below.

> Very excited to see the progress of adding target description and cost model interface into MLIR. I have below questions and comments, most of them relating to CPU devices:
> 
> 1. How to initialize the target description? For host CPU, do you plan to auto-detect with CPUID?

I think the initialization of a target description looks orthogonal to this PR. But we do envision that there could be multiple approaches for initialization: 1) pre-existing target descriptions that are written by system designers or could be auto-generated using system details (e.g., [ark.intel.com](https://ark.intel.com/content/www/us/en/ark.html), cpuid could be useful here), or 2) as the number of possible targets are considerably more than the uarchs (an approach followed by LLVM TTI may not be feasible), we can provide reasonable defaults to common target parameters and then provide ways to override them using target-specific values. Let us know if you have other thoughts.

> 2. `MaxVectorWidth` is not enough for CPU, for example, for a device with AMX, if we set its `MaxVectorWidth` to 1024, it will cause trouble when generating element-wise op because element-wise op needs AVX-512 whose register width is 512. If we set `MaxVectorWidth` to 512, then we won't know if AMX is available. Do you plan to introduce other fields like `arch`(x86 or arm) and `ISAs`(AVX512 or AMX) for detection?

Yes, this PR is laying out very basic infrastructure to get things going and provide a couple of examples of system descriptions that provide parameters used by existing MLIR passes. We will definitely need further details such as arch/ISA in addition to CPU as a device so that passes have the appropriate context to consume the target description information. That being said, we do not want to duplicate information provided by LLVM TTI already.

> 3. How to handle compile-time configurations like `num_threads`, `maximum_ISA`? Shall we introduce them into target description? My thinking is, target description is unique and usually read-only. It represents the nature of hardware and is shared with all compilation pipelines in parallel. For compile-time configurations, usually they're one-time configurations which only apply to the current compilation pipeline. We should manage them separately.

Yes, I think compile-time configurations will not be part of the target descriptions. As you mentioned, target description describes the underlying system.

> 4. Suggest splitting hardware description with cost model interface and moving cost model interface from global context to op implementation. Because cost model factor like the `ConvAndMatMulBlockingFactor` will depend on `OP`(conv or matmul), `tensor_kind`(dense or sparse), `data_type`(BF16 or INT8), `tensor_shape`(regular or irregular), `num_threads`(balance or imbalance), `algorithm`(direct_matmul or K_slicing), `ISA_dispatch`(AVX512 or AMX). It is hard to define single factor for all of them.

Agree. `setConvAndMatMulBlockingFactor` function in [SystemDesc.h](https://github.com/llvm/llvm-project/blob/34f7612a2e1e0a3ba19ff71b54059fa4975bd861/mlir/include/mlir/Support/SystemDesc.h#L431) is supposed to capture the formula/logic to set the blocking factor by using all the necessary info that is currently used by an MLIR pass to calculate the value. In a sense, this function (or its extended versions) is intended to be the place for logic/formula that is currently embedded in related MLIR passes.

https://github.com/llvm/llvm-project/pull/85141