[Mlir-commits] [mlir] [mlir] GEMM Hopper Tensor Core Integration Test (PR #81478)
Jacques Pienaar
llvmlistbot at llvm.org
Sun Mar 3 05:47:43 PST 2024
https://github.com/jpienaar commented:
> *[Reviewable](https://reviewable.io/reviews/llvm/llvm-project/81478)* status: 0 of 5 files reviewed, 24 unresolved discussions (waiting on @grypp, @joker-eph, @manishucsd, and @vinodgro)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 44 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3UejX5sGoVinffKFD:-Ns3UejX5sGoVinffKFE:blc0pbf) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L44)):*
> ```Python
> #
> # This kernel launches a single warp group (128 threads). The primary thread
> # (pthread) requests load from TMA. Threads collectively wait for the data and
> ```
pthread is rather engrained in folks. How about just main?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 128 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3V6Eq1wlTTIdLiehO:-Ns3V6Er-GlzEKEYKXsO:bdk7ku9) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L128)):*
> ```Python
> )
>
> mlir_nvgpu_module.operation.verify()
> ```
Why is this needed?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 132 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3VLqf3yhVAZhVuQo2:-Ns3VLqf3yhVAZhVuQo3:b348dup) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L132)):*
> ```Python
> # Save generated IR
> if saveIR:
> # print(mlir_nvgpu_module)
> ```
Rm commented out code
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 140 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3VQuT9AX4i_tSIgT3:-Ns3VQuT9AX4i_tSIgT4:b5qhywe) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L140)):*
> ```Python
>
> # Get compiler
> options = f"cubin-chip=sm_90a cubin-features=+ptx80 opt-level=3"
> ```
Should this be an option to the function?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 147 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3V_JN2lNiv22mDB26:-Ns3V_JN2lNiv22mDB27:b-15znt1) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L147)):*
> ```Python
> )
> compiler = nvgpucompiler.NvgpuCompiler(
> options, opt_level=3, shared_libs=[support_lib]
> ```
Does this opt_level have to match the one above?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 256 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WAZZ4mGN5l73dX8S:-Ns3WAZ_08si72PjdYA7:b-i50ejw) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L256)):*
> ```Python
>
>
> # Takes longer time to run
> ```
Are both ran unconditionally? (it feels like one should be part of integration tests while the other could be for unit test)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 259 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Vpnf5sIXqh0fgkQv:-Ns3Vpnf5sIXqh0fgkQw:bz8meoj) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L259)):*
> ```Python
> def test_long():
> for stages in range(1, 7):
> for M in [128, 512, 1024, 4096, 8192]:
> ```
Should gtest parameterized be used here?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/lit.local.cfg` line 1 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WVfX6mEdhom2h1f5:-Ns3WVfX6mEdhom2h1f6:bq67oiw) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/lit.local.cfg#L1)):*
> ```INI
> # Files in this directory are tools, not tests.
> ```
Mmm, feels like we need a different spot for these.
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 25 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WZE41bAAf4japw1y:-Ns3WZE41bAAf4japw1z:b16hpnq) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L25)):*
> ```Python
> CONSUMER_PRIMARY_THREAD = 0
>
> MLIR_DYNAMIC = -9223372036854775808
> ```
Document ? (also could you use the the constant from C/C++ side)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 30 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WreY8QXwrA9mrIGf:-Ns3WreY8QXwrA9mrIGg:b-b7s0a1) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L30)):*
> ```Python
>
>
> def debug_print(fmt, *args, predicate=None, threadNumber=-1, forcePrint=False):
> ```
Is the thread the gpu specific part here?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 61 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3X2_rBmnPWg3vg4aY:-Ns3X2_rBmnPWg3vg4aZ:b-sdjs8g) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L61)):*
> ```Python
>
>
> def get_type_str(ty):
> ```
There has to be something already for this. What do you get if you str on the type?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 76 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3XAoR2TBryuxuaGvE:-Ns3XAoR2TBryuxuaGvF:bmzrmzt) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L76)):*
> ```Python
>
> def get_type_size(ty):
> if ir.F16Type.isinstance(ty):
> ```
The FloatType recently got added to query the width, so you could collapse the fp ones
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 85 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3XLz6E01dfQ0gA6c9:-Ns3XLz6E01dfQ0gA6cA:b2l4dqi) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L85)):*
> ```Python
> return ir.IntegerType(ty).width // 8
> if ir.IndexType.isinstance(ty):
> return 8
> ```
This is an assumption/not generally true. Why is index here? (I would have expected these to be lowered out before here to specific size)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 177 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Xwhd6TXc8uSESvck:-Ns3Xwhd6TXc8uSESvcl:b-b5mm0m) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L177)):*
> ```Python
> smem_space_str = "#gpu.address_space<workgroup>"
> smem_space = ir.Attribute.parse(smem_space_str)
> mbar_ty = ir.Type.parse(
> ```
So none of the nvgpu types have python bindings yet, and just parse the string variant?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 184 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Y7yjB16YXZMOi59F:-Ns3Y7yjB16YXZMOi59G:b-5est6p) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L184)):*
> ```Python
> + ">"
> )
> a_tma_desc_ty = ir.Type.parse(
> ```
Could we wrap these behind helper functions so that we don't have the parses here?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 376 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3YY0E2IIvyHJM8REy:-Ns3YY0E2IIvyHJM8REz:b32jfi4) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L376)):*
> ```Python
> )
> p = arith.cmpi(arith.CmpIPredicate.eq, stage, c(num_stages - 1))
> phaseParity = arith.select(
> ```
Is this formatted using the black formatter? (I don't think we are testing here yet in presubmit)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 694 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Z1rD66RXZialj0YK:-Ns3Z1rD66RXZialj0YL:b8j1e82) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L694)):*
> ```Python
> smem_space_str = "#gpu.address_space<workgroup>"
> smem_space = ir.Attribute.parse(smem_space_str)
> mbar_ty = ir.Type.parse(
> ```
Same for these (well most parse instances)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/nvgpucompiler.py` line 26 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3ZMYp5dCaR25rx7PJ:-Ns3ZMYp5dCaR25rx7PK:bmcj6nw) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/nvgpucompiler.py#L26)):*
> ```Python
> self.pipeline = pipeline
> self.shared_libs = shared_libs
> self.opt_level = opt_level
> ```
Should this be used to add to pipeline?
<!-- Sent from Reviewable.io -->
https://github.com/llvm/llvm-project/pull/81478
More information about the Mlir-commits
mailing list