[Mlir-commits] [mlir] [mlir] GEMM Hopper Tensor Core Integration Test (PR #81478)

Sun Mar 3 05:47:43 PST 2024

https://github.com/jpienaar commented:

> *[Reviewable](https://reviewable.io/reviews/llvm/llvm-project/81478)* status: 0 of 5 files reviewed, 24 unresolved discussions (waiting on @grypp, @joker-eph, @manishucsd, and @vinodgro)

___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 44 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3UejX5sGoVinffKFD:-Ns3UejX5sGoVinffKFE:blc0pbf) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L44)):*
> ```Python
> #
> # This kernel launches a single warp group (128 threads). The primary thread
> # (pthread) requests load from TMA. Threads collectively wait for the data and
> ```

pthread is rather engrained in folks. How about just main?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 128 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3V6Eq1wlTTIdLiehO:-Ns3V6Er-GlzEKEYKXsO:bdk7ku9) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L128)):*
> ```Python
>             )
> 
>         mlir_nvgpu_module.operation.verify()
> ```

Why is this needed?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 132 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3VLqf3yhVAZhVuQo2:-Ns3VLqf3yhVAZhVuQo3:b348dup) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L132)):*
> ```Python
>         # Save generated IR
>         if saveIR:
>             # print(mlir_nvgpu_module)
> ```

Rm commented out code
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 140 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3VQuT9AX4i_tSIgT3:-Ns3VQuT9AX4i_tSIgT4:b5qhywe) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L140)):*
> ```Python
> 
>         # Get compiler
>         options = f"cubin-chip=sm_90a cubin-features=+ptx80 opt-level=3"
> ```

Should this be an option to the function?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 147 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3V_JN2lNiv22mDB26:-Ns3V_JN2lNiv22mDB27:b-15znt1) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L147)):*
> ```Python
>             )
>         compiler = nvgpucompiler.NvgpuCompiler(
>             options, opt_level=3, shared_libs=[support_lib]
> ```

Does this opt_level have to match the one above?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 256 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WAZZ4mGN5l73dX8S:-Ns3WAZ_08si72PjdYA7:b-i50ejw) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L256)):*
> ```Python
> 
> 
> # Takes longer time to run
> ```

Are both ran unconditionally? (it feels like one should be part of integration tests while the other could be for unit test)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py` line 259 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Vpnf5sIXqh0fgkQv:-Ns3Vpnf5sIXqh0fgkQw:bz8meoj) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/matmul.py#L259)):*
> ```Python
> def test_long():
>     for stages in range(1, 7):
>         for M in [128, 512, 1024, 4096, 8192]:
> ```

Should gtest parameterized be used here?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/lit.local.cfg` line 1 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WVfX6mEdhom2h1f5:-Ns3WVfX6mEdhom2h1f6:bq67oiw) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/lit.local.cfg#L1)):*
> ```INI
> # Files in this directory are tools, not tests.
> ```

Mmm, feels like we need a different spot for these.
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 25 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WZE41bAAf4japw1y:-Ns3WZE41bAAf4japw1z:b16hpnq) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L25)):*
> ```Python
> CONSUMER_PRIMARY_THREAD = 0
> 
> MLIR_DYNAMIC = -9223372036854775808
> ```

Document ? (also could you use the the constant from C/C++ side)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 30 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3WreY8QXwrA9mrIGf:-Ns3WreY8QXwrA9mrIGg:b-b7s0a1) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L30)):*
> ```Python
> 
> 
> def debug_print(fmt, *args, predicate=None, threadNumber=-1, forcePrint=False):
> ```

Is the thread the gpu specific part here?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 61 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3X2_rBmnPWg3vg4aY:-Ns3X2_rBmnPWg3vg4aZ:b-sdjs8g) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L61)):*
> ```Python
> 
> 
> def get_type_str(ty):
> ```

There has to be something already for this. What do you get if you str on the type?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 76 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3XAoR2TBryuxuaGvE:-Ns3XAoR2TBryuxuaGvF:bmzrmzt) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L76)):*
> ```Python
> 
> def get_type_size(ty):
>     if ir.F16Type.isinstance(ty):
> ```

The FloatType recently got added to query the width, so you could collapse the fp ones
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 85 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3XLz6E01dfQ0gA6c9:-Ns3XLz6E01dfQ0gA6cA:b2l4dqi) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L85)):*
> ```Python
>         return ir.IntegerType(ty).width // 8
>     if ir.IndexType.isinstance(ty):
>         return 8
> ```

This is an assumption/not generally true. Why is index here? (I would have expected these to be lowered out before here to specific size)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 177 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Xwhd6TXc8uSESvck:-Ns3Xwhd6TXc8uSESvcl:b-b5mm0m) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L177)):*
> ```Python
>     smem_space_str = "#gpu.address_space<workgroup>"
>     smem_space = ir.Attribute.parse(smem_space_str)
>     mbar_ty = ir.Type.parse(
> ```

So none of the nvgpu types have python bindings yet, and just parse the string variant?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 184 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Y7yjB16YXZMOi59F:-Ns3Y7yjB16YXZMOi59G:b-5est6p) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L184)):*
> ```Python
>         + ">"
>     )
>     a_tma_desc_ty = ir.Type.parse(
> ```

Could we wrap these behind helper functions so that we don't have the parses here?
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 376 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3YY0E2IIvyHJM8REy:-Ns3YY0E2IIvyHJM8REz:b32jfi4) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L376)):*
> ```Python
>                         )
>                         p = arith.cmpi(arith.CmpIPredicate.eq, stage, c(num_stages - 1))
>                         phaseParity = arith.select(
> ```

Is this formatted using the black formatter? (I don't think we are testing here yet in presubmit)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py` line 694 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3Z1rD66RXZialj0YK:-Ns3Z1rD66RXZialj0YL:b8j1e82) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py#L694)):*
> ```Python
>     smem_space_str = "#gpu.address_space<workgroup>"
>     smem_space = ir.Attribute.parse(smem_space_str)
>     mbar_ty = ir.Type.parse(
> ```

Same for these (well most parse instances)
___
*[`mlir/test/Integration/GPU/CUDA/sm90/python/tools/nvgpucompiler.py` line 26 at r3](https://reviewable.io/reviews/llvm/llvm-project/81478#-Ns3ZMYp5dCaR25rx7PJ:-Ns3ZMYp5dCaR25rx7PK:bmcj6nw) ([raw file](https://github.com/llvm/llvm-project/blob/6da82f445e66be30e31674aaf9aa7f075ac02735/mlir/test/Integration/GPU/CUDA/sm90/python/tools/nvgpucompiler.py#L26)):*
> ```Python
>         self.pipeline = pipeline
>         self.shared_libs = shared_libs
>         self.opt_level = opt_level
> ```

Should this be used to add to pipeline?

<!-- Sent from Reviewable.io -->

https://github.com/llvm/llvm-project/pull/81478