[PATCH] D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+.

Mon Mar 18 17:45:08 PDT 2019

timshen accepted this revision.
timshen added inline comments.
This revision is now accepted and ready to land.

================
Comment at: llvm/test/CodeGen/NVPTX/wmma.py:244
+
 main()
----------------
tra wrote:
> timshen wrote:
> > timshen wrote:
> > > Who is supposed to run this script? Can we check-in the result of this script and make them part of the regression tests?
> > > 
> > > Relatedly, for other backends we have a framework for it. See `llvm/utils/update_llc_test_checks.py`. The generated file looks like `llvm/test/CodeGen/PowerPC/atomics-regression.ll`.
> > > 
> > > One of the advantages to check-in the generated file is that, and succeeding behavioral changes are reflected in the patch.
> > > Who is supposed to run this script?
> > 
> > I guess I can answer this part - lit. Still, it'd be great to check-in the generated .ll files with RUN lines in them.
> The script is executed by the lit which then runs llc with the generated output and checks the resulting PTX.
> 
> I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans.
> 
> e.g just one function out of *a lot*:
> 
> ```
> declare {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src );
> 
> ; CHECK-LABEL: .func {{.*}}test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(
> define {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(i8 addrspace(3)* %src ) {
> ; CHECK: wmma.load.a.sync.aligned.row.m16n16k16.shared.f16
> ; CHECK: {{{%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+}}}
> ; CHECK: [%rd{{[0-9]+}}]
>   %v0 = call {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src );
>   ret {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} %v0;
> }
> 
> ```
> 
> It's easy enough to generate or grab the one done by lit -- the name would be right there in the failing command.
> I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans.

test/CodeGen/X86 already does this:
...src/llvm-project/llvm % wc -c `grep 'autogenerated by utils/update_llc' -r test/CodeGen/X86 -l` | tail -1
37287954 total

It has 37MB of autogenerated .ll files, presumably for its massive intrinsics.

> e.g just one function out of *a lot*:

Compared to `test/CodeGen/X86/avx512vl-vec-masked-cmp.ll` it isn't that bad. ;)

However, I realized that `wmma.py` is somewhat different from `utils/update_llc_test_checks.py`

What `utils/update_llc_test_checks.py` does:
* Run llc on the *arbitrary* input IR and get the asm output.
* Use regex replacement to turn the asm into CHECK lines. The regexes are different for different targets.
* Print out the .ll file with those CHECK lines.

What `wmma.py` does:
* Enumerate all possible combinations of wmma IR inputs.
* Generate the CHECK lines directly using the same wmma-specific knowledge that generates the IR.
* Print out the .ll file with the CHECK lines.

The key difference is that `update_llc_test_checks.py` won't be wmma-specific.

Another crucial difference is that wmma.py generates very generic check-lines like `[%rd{{[0-9]+}}]`, while `update_llc_test_checks.py` usually prints out the exact literal it extracts from the asm result, e.g. `%rd1`.

As a result, wmma.py's output isn't as readable as I thought it would be (less literals), so I'm fine without checking-in the wppa.py-generated files.

However, I encourage that some of the NVPTX contributors (!) add NVPTX support to `update_llc_test_checks.py`. With that, we could have supported wmma.py almost freely, along with all other kinds of PTX regression tests.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59393/new/

https://reviews.llvm.org/D59393